boneanxs commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1606013767
> if oyu could attach the query plan for before and after this change, it
would be helpful.
There's no query plan difference btw before and after, since all filters
will be pushed to hudi, but some filters won't take effect before this pr.
I tested a table with 5w partitions(region, date, hour), and print timeCost
in `org.apache.hudi.SparkHoodieTableFileIndex#tryListByPartitionPathPrefix`
```scala
private def tryListByPartitionPathPrefix(partitionColumnNames:
Seq[String], partitionColumnPredicates: Seq[Expression]) = {
// Static partition-path prefix is defined as a prefix of the full
partition-path where only
// first N partition columns (in-order) have proper (static) values
bound in equality predicates,
// allowing in turn to build such prefix to be used in subsequent
filtering
val startTime = System.currentTimeMillis()
//...
log.info(s"Time cost to listing files: ${System.currentTimeMillis() -
startTime}ms")
result
}
```
Pushed with filter `date=date"2023-06-20`, and run it in Local[10] mode 3
times, we can see the time can be saved with this pr
### Before the pr
```
23/06/25 18:09:11 INFO HoodieFileIndex: Time cost to listing files: 42745ms
23/06/25 18:12:04 INFO HoodieFileIndex: Time cost to listing files: 37495ms
23/06/25 18:15:14 INFO HoodieFileIndex: Time cost to listing files: 43496ms
```
### After the pr
```
23/06/25 18:19:35 INFO HoodieFileIndex: Time cost to listing files: 10928ms
23/06/25 18:20:29 INFO HoodieFileIndex: Time cost to listing files: 10015ms
23/06/25 18:21:25 INFO HoodieFileIndex: Time cost to listing files: 12032ms
```
SInce my backend storage is `HDFS`, I think it could save more time if using
`ObjectStore`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]