[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

GitBox Sun, 08 Aug 2021 04:48:25 -0700


xiarixiaoyao commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r684762348




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##########
@@ -365,6 +385,13 @@ private void 
completeClustering(HoodieReplaceCommitMetadata metadata, JavaRDD<Wr
     }
     finalizeWrite(table, clusteringCommitTime, writeStats);
     try {
+      // try to save statistics info to hudi
+      if (config.getOptimizeEnableDataSkipping() && 
!config.getOptimizeSortColumns().isEmpty()) {
+        String basePath = table.getMetaClient().getBasePath();

Review comment:
       @satishkotha 
   no, only optimize operation will produce statistics。 
   and we save those statistics info to the path ./hoodie/.index with 
commitTime as name
   **_// /tmp/mytest/.hoodie/.index
   20210808123645
   //
   20210808123645 is the index table name._**
   
   if the indexPath has no index table, we will save statistis info direclty as 
parquet table with commitTime as it's name
   if the indexPath has old index table, we will update the old index table by 
statistis info with full out join method. then save the updated info into a new 
parquet table with commitTime as it's name
   
   In the hoodieFileIndex, we do data skip by use lastest index table.  Filters 
from query statement will be convert to the filter for index table,  choose the 
filter files from index table, than do filter 。
   
   of course this method is simple, but it's enough to do data skip for 
z-order/hilbert optimitze. RFC-27 is a surprising feature for data skip, 
however this feature is not yet compeled.  Once RFC-27 has been completed , i 
will do adaptation。
   
   If possible, I tread to participate in the development of rfc-27
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

Reply via email to