And if you meant HUDI storage type, I have left it to default COW - Copy On 
Write.

If anyone has tried this please let me know if you have hit similar issue. Any 
experience would be greatly helpful.
On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org> wrote:
> Hi Vinoth - thanks for the quick response.
>
> I have followed the mail thread for deletes -> 
> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<155556722511.2660.9583626796839453...@gitbox.apache.org>
>
> For your convenience, the code that I use is below at the end of the email. 
> EmptyHoodieRecord is inserted for the relevant records that need to be 
> deleted. After the delete, I can query from Hive and confirm that the rows 
> intended to be deleted are no longer present and the records not deleted can 
> be seen in the Hive table via Hive and Presto.
> The issue starts when the upsert is done after a delete.
> The storage type is S3 and I dont think there is any eventual consistency in 
> play as the record upserted is visible but the old records that werent 
> deleted are not visible.
> And for the sake of completion, my insert and upsert logic is based out of 
> the code below: 
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> Thanks
> Kabeer.
>
> > /**
> > * Empty payload used for deletions
> > */
> > public class EmptyHoodieRecordPayload implements 
> > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > {
> > public EmptyHoodieRecordPayload(GenericRecord record, Comparable 
> > orderingVal) { }
> > @Override
> > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload 
> > another) {
> > return another;
> > }
> > @Override
> > public Optional<IndexedRecord> combineAndGetUpdateValue(IndexedRecord 
> > currentValue,
> > chema schema) {
> > return Optional.empty();
> > }
> > @Override
> > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > return Optional.empty();
> > }
> > }
> ---------- Forwarded Message ---------
>
> From: Vinoth Chandar <vin...@apache.org>
> Subject: Re: Upsert after Delete
> Date: Aug 22 2019, at 8:38 pm
> To: dev@hudi.apache.org
>
> That’s interesting. Can you also share details on storage type and how you
> are issuing the deletes and also the table/view (ro, rt) that you are
> querying?
>
> On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > Hudi experts and Users,
> > Has anyone attempted an upsert after a delete? Here is a weird thing that
> > I have bumped into and it is a shame that this has come up when someone in
> > the team tested this whilst I failed to run this test.
> > Use case:
> > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> >
> > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it is
> > visible via sql through Presto/Hive.
> >
> > Upsert a new record into the same table (3, balaji). Query the table and
> > only record that is visible is: (3, balaji). The record (2, vinoth) is not
> > displayed in the results.
> >
> > Any ideas on what could be at play here? Has someone done upsert after
> > delete?
> >
> > Thanks,
> > Kabeer
> >
> > PS: Please note that upsert functionality is well tested and if we do (1,
> > vinoth) insert followed by upsert of (2, balaji) both the records are
> > visible. So something else is at play and would appreciate any help that
> > you experts can provide insight.
>
>
>
>

Reply via email to