Hi Vinoth - thanks for the quick response. I have followed the mail thread for deletes -> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<[email protected]>
For your convenience, the code that I use is below at the end of the email. EmptyHoodieRecord is inserted for the relevant records that need to be deleted. After the delete, I can query from Hive and confirm that the rows intended to be deleted are no longer present and the records not deleted can be seen in the Hive table via Hive and Presto. The issue starts when the upsert is done after a delete. The storage type is S3 and I dont think there is any eventual consistency in play as the record upserted is visible but the old records that werent deleted are not visible. And for the sake of completion, my insert and upsert logic is based out of the code below: https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43 Thanks Kabeer. > /** > * Empty payload used for deletions > */ > public class EmptyHoodieRecordPayload implements > HoodieRecordPayload<EmptyHoodieRecordPayload> > { > public EmptyHoodieRecordPayload(GenericRecord record, Comparable orderingVal) > { } > @Override > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload another) { > return another; > } > @Override > public Optional<IndexedRecord> combineAndGetUpdateValue(IndexedRecord > currentValue, > chema schema) { > return Optional.empty(); > } > @Override > public Optional<IndexedRecord> getInsertValue(Schema schema) { > return Optional.empty(); > } > } ---------- Forwarded Message --------- From: Vinoth Chandar <[email protected]> Subject: Re: Upsert after Delete Date: Aug 22 2019, at 8:38 pm To: [email protected] That’s interesting. Can you also share details on storage type and how you are issuing the deletes and also the table/view (ro, rt) that you are querying? On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <[email protected]> wrote: > Hudi experts and Users, > Has anyone attempted an upsert after a delete? Here is a weird thing that > I have bumped into and it is a shame that this has come up when someone in > the team tested this whilst I failed to run this test. > Use case: > Insert data into a table. Say records (1, kabeer | 2, vinoth) > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it is > visible via sql through Presto/Hive. > > Upsert a new record into the same table (3, balaji). Query the table and > only record that is visible is: (3, balaji). The record (2, vinoth) is not > displayed in the results. > > Any ideas on what could be at play here? Has someone done upsert after > delete? > > Thanks, > Kabeer > > PS: Please note that upsert functionality is well tested and if we do (1, > vinoth) insert followed by upsert of (2, balaji) both the records are > visible. So something else is at play and would appreciate any help that > you experts can provide insight.
