yes. I was asking about the HUDI storage type..

There is nothing complex about upsert() after delete(). It almost as if a
delete() for (2, vinoth) happened in between.

Are you able to repro this literally with this tiny example with 3 records?
Some things to check

 - This sequence would have created 3 commits. You can look at the commit
files and see if the number of record updated, inserted, deleted match
expectations.
 - if they do, then you can use spark.read.parquet(.). on the individual
parquet files and see what records they actually contain ..

This should shed some light on the pattern of failure and when exactly (2,
vinoth) disappeared.

Alternatively, if you can give a small snippet that reproduces this, we can
debug from there.






On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <[email protected]> wrote:

> And if you meant HUDI storage type, I have left it to default COW - Copy
> On Write.
>
> If anyone has tried this please let me know if you have hit similar issue.
> Any experience would be greatly helpful.
> On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <[email protected]> wrote:
> > Hi Vinoth - thanks for the quick response.
> >
> > I have followed the mail thread for deletes ->
> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> [email protected]>
> >
> > For your convenience, the code that I use is below at the end of the
> email. EmptyHoodieRecord is inserted for the relevant records that need to
> be deleted. After the delete, I can query from Hive and confirm that the
> rows intended to be deleted are no longer present and the records not
> deleted can be seen in the Hive table via Hive and Presto.
> > The issue starts when the upsert is done after a delete.
> > The storage type is S3 and I dont think there is any eventual
> consistency in play as the record upserted is visible but the old records
> that werent deleted are not visible.
> > And for the sake of completion, my insert and upsert logic is based out
> of the code below:
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > Thanks
> > Kabeer.
> >
> > > /**
> > > * Empty payload used for deletions
> > > */
> > > public class EmptyHoodieRecordPayload implements
> HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > {
> > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> orderingVal) { }
> > > @Override
> > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> another) {
> > > return another;
> > > }
> > > @Override
> > > public Optional<IndexedRecord> combineAndGetUpdateValue(IndexedRecord
> currentValue,
> > > chema schema) {
> > > return Optional.empty();
> > > }
> > > @Override
> > > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > > return Optional.empty();
> > > }
> > > }
> > ---------- Forwarded Message ---------
> >
> > From: Vinoth Chandar <[email protected]>
> > Subject: Re: Upsert after Delete
> > Date: Aug 22 2019, at 8:38 pm
> > To: [email protected]
> >
> > That’s interesting. Can you also share details on storage type and how
> you
> > are issuing the deletes and also the table/view (ro, rt) that you are
> > querying?
> >
> > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <[email protected]>
> wrote:
> > > Hudi experts and Users,
> > > Has anyone attempted an upsert after a delete? Here is a weird thing
> that
> > > I have bumped into and it is a shame that this has come up when
> someone in
> > > the team tested this whilst I failed to run this test.
> > > Use case:
> > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > >
> > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it
> is
> > > visible via sql through Presto/Hive.
> > >
> > > Upsert a new record into the same table (3, balaji). Query the table
> and
> > > only record that is visible is: (3, balaji). The record (2, vinoth) is
> not
> > > displayed in the results.
> > >
> > > Any ideas on what could be at play here? Has someone done upsert after
> > > delete?
> > >
> > > Thanks,
> > > Kabeer
> > >
> > > PS: Please note that upsert functionality is well tested and if we do
> (1,
> > > vinoth) insert followed by upsert of (2, balaji) both the records are
> > > visible. So something else is at play and would appreciate any help
> that
> > > you experts can provide insight.
> >
> >
> >
> >
>
>

Reply via email to