xushiyan commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r1022548772


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##########
@@ -237,70 +246,64 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
     }
   }
 
-  private def listMatchingPartitionPathsInternal(partitionColumnNames: 
Seq[String],
-                                                 partitionColumnPredicates: 
Seq[Expression]): Seq[PartitionPath] = {
-    // NOTE: Here we try to to achieve efficiency in avoiding necessity to 
recursively list deep folder structures of
-    //       partitioned tables w/ multiple partition columns, by carefully 
analyzing provided partition predicates:
-    //
-    //       In cases when partition-predicates have
-    //         - The form of equality predicates w/ static literals (for ex, 
like `date = '2022-01-01'`)
-    //         - Fully specified proper prefix of the partition schema (ie 
fully binding first N columns
-    //           of the partition schema adhering to hereby described rules)
-    //
-    // We will try to exploit this specific structure, and try to reduce the 
scope of a
-    // necessary file-listings of partitions of the table to just the 
sub-folder under relative prefix
-    // of the partition-path derived from the partition-column predicates. For 
ex, consider following
-    // scenario:
-    //
-    // Table's partition schema (in-order):
-    //
-    //    country_code: string (for ex, 'us')
-    //    date: string (for ex, '2022-01-01')
-    //
-    // Table's folder structure:
-    //    us/
-    //     |- 2022-01-01/
-    //     |- 2022-01-02/
-    //     ...
-    //
-    // In case we have incoming query specifies following predicates:
-    //
-    //    `... WHERE country_code = 'us' AND date = '2022-01-01'`
-    //
-    // We can deduce full partition-path w/o doing a single listing: 
`us/2022-01-01`
-    if (areAllPartitionPathsCached || 
!shouldUsePartitionPathPrefixAnalysis(configProperties)) {
-      logDebug("All partition paths have already been cached, use it directly")
+  // NOTE: Here we try to to achieve efficiency in avoiding necessity to 
recursively list deep folder structures of
+  //       partitioned tables w/ multiple partition columns, by carefully 
analyzing provided partition predicates:
+  //
+  //       In cases when partition-predicates have
+  //         - The form of equality predicates w/ static literals (for ex, 
like `date = '2022-01-01'`)
+  //         - Fully specified proper prefix of the partition schema (ie fully 
binding first N columns
+  //           of the partition schema adhering to hereby described rules)
+  //
+  // We will try to exploit this specific structure, and try to reduce the 
scope of a
+  // necessary file-listings of partitions of the table to just the sub-folder 
under relative prefix
+  // of the partition-path derived from the partition-column predicates. For 
ex, consider following
+  // scenario:
+  //
+  // Table's partition schema (in-order):
+  //
+  //    country_code: string (for ex, 'us')
+  //    date: string (for ex, '2022-01-01')
+  //
+  // Table's folder structure:
+  //    us/
+  //     |- 2022-01-01/
+  //     |- 2022-01-02/
+  //     ...
+  //
+  // In case we have incoming query specifies following predicates:
+  //
+  //    `... WHERE country_code = 'us' AND date = '2022-01-01'`
+  //
+  // We can deduce full partition-path w/o doing a single listing: 
`us/2022-01-01`

Review Comment:
   correct



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to