codope commented on code in PR #8402:
URL: https://github.com/apache/hudi/pull/8402#discussion_r1165076903
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##########
@@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
// prefix to try to reduce the scope of the required file-listing
val relativePartitionPathPrefix =
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
- if (staticPartitionColumnNameValuePairs.length ==
partitionColumnNames.length) {
+ if (!metaClient.getFs.exists(new Path(getBasePath,
relativePartitionPathPrefix))) {
Review Comment:
`fs.exists` call is costly. This will impact latency. How often do we run
into this scenario? FS cache is invalidated on each refresh anyway, so I am
wondering if we really need to do fs.exists check everytime.
Can we not simply catch the exception and continue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]