bvaradar commented on a change in pull request #1004: [HUDI-15] Adding delete 
api to HoodieWriteClient
URL: https://github.com/apache/incubator-hudi/pull/1004#discussion_r344450733
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##########
 @@ -325,6 +326,31 @@ public static SparkConf registerClasses(SparkConf conf) {
     }
   }
 
+  /**
+   * Deletes a bunch of keys from the Hoodie table, at the supplied commitTime
+   */
+  public JavaRDD<WriteStatus> delete(JavaRDD<HoodieKey> keys, final String 
commitTime) {
+    HoodieTable<T> table = getTableAndInitCtx();
+    try {
+      // De-dupe/merge if needed
+      JavaRDD<HoodieKey> dedupedKeys =
+          combineKeysOnCondition(config.shouldCombineBeforeUpsert(), keys, 
config.getUpsertShuffleParallelism());
+
+      JavaRDD<HoodieRecord<T>> dedupedRecords = 
generateHoodieRecordsToDeleteFromKeys(dedupedKeys);
+      indexTimer = metrics.getIndexCtx();
+      // perform index loop up to get existing location of records
+      JavaRDD<HoodieRecord<T>> taggedRecords = 
index.tagLocation(dedupedRecords, jsc, table);
 
 Review comment:
   I think we need to do an important step here : remove record keys which does 
not have location. For hard-deletes, if the record-key to be deleted is not 
there, we can simply filter them out here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to