cshuo commented on code in PR #12132:
URL: https://github.com/apache/hudi/pull/12132#discussion_r1810177941


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java:
##########
@@ -177,7 +174,8 @@ public List<StoragePathInfo> getFilesInPartitions() {
     }
 
     // data skipping
-    Set<String> candidateFiles = candidateFilesInMetadataTable(allFiles);
+    Set<String> candidateFiles = 
ColumnStatsIndices.candidateFilesInMetadataTable(path.toString(), 
metadataConfig,
+        rowType, dataPruner, 
allFiles.stream().map(StoragePathInfo::toString).collect(Collectors.toList()));

Review Comment:
   Nice catch!



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -378,6 +378,14 @@ private FlinkOptions() {
       .withDescription("Enables data-skipping allowing queries to leverage 
indexes to reduce the search space by "
           + "skipping over files");
 
+  @AdvancedConfig
+  public static final ConfigOption<Boolean> 
READ_PARTITION_DATA_SKIPPING_ENABLED = ConfigOptions
+      .key("read.partition.data.skipping.enabled")

Review Comment:
   ok, I'll update.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java:
##########
@@ -85,16 +89,109 @@ public class ColumnStatsIndices {
   private ColumnStatsIndices() {
   }
 
-  public static List<RowData> readColumnStatsIndex(String basePath, 
HoodieMetadataConfig metadataConfig, String[] targetColumns) {
+  public static Set<String> candidatePartitionsInMetadataTable(

Review Comment:
   I did not make abstracts in the PR because Partition Stats Index and Column 
Stats Index share same API and same storage schema in metadata table. But it 
sounds ok to have some basic abstractions here as we plan to introduce more 
indexes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to