soumilshah1995 commented on issue #10110:
URL: https://github.com/apache/hudi/issues/10110#issuecomment-2250710861

   just updating this thread I did small test 
   
   # Before Index Creation 
   ```
   spark.read.format("hudi") \
       .option("hoodie.enable.data.skipping", "true") \
       .option("hoodie.metadata.enable", "true") \
       .option("hoodie.metadata.index.column.stats.enable", "true") \
       .load(path) \
       .createOrReplaceTempView("snapshots")
   
   
   spark.sql("""
   SELECT * FROM snapshots
   
   """).printSchema()
   result = spark.sql("""
   SELECT 
       event_type, user_id, event_id
   FROM
       snapshots
   WHERE 
       date_format(timestamp, 'yyyy-MM-dd') = '2023-06-17'
   """)
   
   
   result.show()
   result.explain(True)
   ```
   
![image](https://github.com/user-attachments/assets/d6af8bcb-e960-4c69-8a7e-5e84f3c64745)
   
   
   ## After creating Index 
   ```
   query_create_ts_datestr = f"""
   CREATE INDEX IF NOT EXISTS ts_datestr 
   ON 
       web_events 
   USING 
       column_stats(timestamp) 
   OPTIONS(func='from_unixtime', format='yyyy-MM-dd')
   """
   result = spark.sql(query_create_ts_datestr)
   
   
   
   result = spark.sql("""
   SELECT 
       event_type, user_id, event_id
   FROM
       web_events
   WHERE 
       date_format(timestamp, 'yyyy-MM-dd') = '2023-06-17'
   """)
   
   
   result.show()
   result.explain(True)
   ```
   
   
![image](https://github.com/user-attachments/assets/16777f4b-7fb0-47e6-9683-fb579f85febd)
   
   
   I do see difference in query time its faster what else I should see to 
ensure this is working as expected ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to