bvaradar commented on a change in pull request #1004: [HUDI-328] Adding delete 
api to HoodieWriteClient
URL: https://github.com/apache/incubator-hudi/pull/1004#discussion_r346440995
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##########
 @@ -325,6 +326,31 @@ public static SparkConf registerClasses(SparkConf conf) {
     }
   }
 
+  /**
+   * Deletes a bunch of keys from the Hoodie table, at the supplied commitTime
+   */
+  public JavaRDD<WriteStatus> delete(JavaRDD<HoodieKey> keys, final String 
commitTime) {
+    HoodieTable<T> table = getTableAndInitCtx();
+    try {
+      // De-dupe/merge if needed
+      JavaRDD<HoodieKey> dedupedKeys =
+          combineKeysOnCondition(config.shouldCombineBeforeUpsert(), keys, 
config.getUpsertShuffleParallelism());
+
+      JavaRDD<HoodieRecord<T>> dedupedRecords = 
generateHoodieRecordsToDeleteFromKeys(dedupedKeys);
+      indexTimer = metrics.getIndexCtx();
+      // perform index loop up to get existing location of records
+      JavaRDD<HoodieRecord<T>> taggedRecords = 
index.tagLocation(dedupedRecords, jsc, table);
 
 Review comment:
   Can you double check. We allow inserts to happen as part of upsert() calls 
and it uses similar path. Also, Look at HoodieReadClient.filterExists. You need 
an opposite of that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to