soumilshah1995 commented on issue #10110:
URL: https://github.com/apache/hudi/issues/10110#issuecomment-2250710861
just updating this thread I did small test
# Before Index Creation
```
spark.read.format("hudi") \
.option("hoodie.enable.data.skipping", "true") \
.option("hoodie.metadata.enable", "true") \
.option("hoodie.metadata.index.column.stats.enable", "true") \
.load(path) \
.createOrReplaceTempView("snapshots")
spark.sql("""
SELECT * FROM snapshots
""").printSchema()
result = spark.sql("""
SELECT
event_type, user_id, event_id
FROM
snapshots
WHERE
date_format(timestamp, 'yyyy-MM-dd') = '2023-06-17'
""")
result.show()
result.explain(True)
```

## After creating Index
```
query_create_ts_datestr = f"""
CREATE INDEX IF NOT EXISTS ts_datestr
ON
web_events
USING
column_stats(timestamp)
OPTIONS(func='from_unixtime', format='yyyy-MM-dd')
"""
result = spark.sql(query_create_ts_datestr)
result = spark.sql("""
SELECT
event_type, user_id, event_id
FROM
web_events
WHERE
date_format(timestamp, 'yyyy-MM-dd') = '2023-06-17'
""")
result.show()
result.explain(True)
```

I do see difference in query time its faster what else I should see to
ensure this is working as expected ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]