And if you meant HUDI storage type, I have left it to default COW - Copy On Write.
If anyone has tried this please let me know if you have hit similar issue. Any experience would be greatly helpful. On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org> wrote: > Hi Vinoth - thanks for the quick response. > > I have followed the mail thread for deletes -> > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<155556722511.2660.9583626796839453...@gitbox.apache.org> > > For your convenience, the code that I use is below at the end of the email. > EmptyHoodieRecord is inserted for the relevant records that need to be > deleted. After the delete, I can query from Hive and confirm that the rows > intended to be deleted are no longer present and the records not deleted can > be seen in the Hive table via Hive and Presto. > The issue starts when the upsert is done after a delete. > The storage type is S3 and I dont think there is any eventual consistency in > play as the record upserted is visible but the old records that werent > deleted are not visible. > And for the sake of completion, my insert and upsert logic is based out of > the code below: > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43 > Thanks > Kabeer. > > > /** > > * Empty payload used for deletions > > */ > > public class EmptyHoodieRecordPayload implements > > HoodieRecordPayload<EmptyHoodieRecordPayload> > > { > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable > > orderingVal) { } > > @Override > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload > > another) { > > return another; > > } > > @Override > > public Optional<IndexedRecord> combineAndGetUpdateValue(IndexedRecord > > currentValue, > > chema schema) { > > return Optional.empty(); > > } > > @Override > > public Optional<IndexedRecord> getInsertValue(Schema schema) { > > return Optional.empty(); > > } > > } > ---------- Forwarded Message --------- > > From: Vinoth Chandar <vin...@apache.org> > Subject: Re: Upsert after Delete > Date: Aug 22 2019, at 8:38 pm > To: dev@hudi.apache.org > > That’s interesting. Can you also share details on storage type and how you > are issuing the deletes and also the table/view (ro, rt) that you are > querying? > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <kab...@linuxmail.org> wrote: > > Hudi experts and Users, > > Has anyone attempted an upsert after a delete? Here is a weird thing that > > I have bumped into and it is a shame that this has come up when someone in > > the team tested this whilst I failed to run this test. > > Use case: > > Insert data into a table. Say records (1, kabeer | 2, vinoth) > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it is > > visible via sql through Presto/Hive. > > > > Upsert a new record into the same table (3, balaji). Query the table and > > only record that is visible is: (3, balaji). The record (2, vinoth) is not > > displayed in the results. > > > > Any ideas on what could be at play here? Has someone done upsert after > > delete? > > > > Thanks, > > Kabeer > > > > PS: Please note that upsert functionality is well tested and if we do (1, > > vinoth) insert followed by upsert of (2, balaji) both the records are > > visible. So something else is at play and would appreciate any help that > > you experts can provide insight. > > > >