I see. will look at it from that context as well then.
I am still trying to repro this. Hopefully, if there is a bug, we can
quickly fix and get it into the first release.

On Wed, Aug 28, 2019 at 11:48 PM Jaimin Shah <[email protected]>
wrote:

> Hi
>   I remember I was also facing some issues with deletes. Maybe both issues
> are related ? After deletes not able to query data. At that time
>
> https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
> this issue was filled. Is this issue now resolved?
>
> Thanks,
> Jaimin
>
> On Wed, 28 Aug 2019 at 23:29, [email protected] <[email protected]>
> wrote:
>
> >
> > Hi Kabeer,
> > I have requested some information in the github ticket.
> > Balaji.V    On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed
> <
> > [email protected]> wrote:
> >
> >  Thanks for the quick response Vinoth. That is what I would have thought
> > that there is nothing complex or different in upsert after a delete.
> Yes, I
> > can reproduce the issue with simple example that I have written in the
> > email.
> >
> > I have dug into the issue in detail and it seems it is a bug. I have
> filed
> > it at: https://github.com/apache/incubator-hudi/issues/859 (
> >
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> ).
> > Let me know if more information is required.
> > Thank you,
> >
> > On Aug 23 2019, at 1:37 am, Vinoth Chandar <[email protected]> wrote:
> > > yes. I was asking about the HUDI storage type..
> > >
> > > There is nothing complex about upsert() after delete(). It almost as
> if a
> > > delete() for (2, vinoth) happened in between.
> > >
> > > Are you able to repro this literally with this tiny example with 3
> > records?
> > > Some things to check
> > >
> > > - This sequence would have created 3 commits. You can look at the
> commit
> > > files and see if the number of record updated, inserted, deleted match
> > > expectations.
> > > - if they do, then you can use spark.read.parquet(.). on the individual
> > > parquet files and see what records they actually contain ..
> > >
> > > This should shed some light on the pattern of failure and when exactly
> > (2,
> > > vinoth) disappeared.
> > >
> > > Alternatively, if you can give a small snippet that reproduces this, we
> > can
> > > debug from there.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <[email protected]>
> > wrote:
> > > > And if you meant HUDI storage type, I have left it to default COW -
> > Copy
> > > > On Write.
> > > >
> > > > If anyone has tried this please let me know if you have hit similar
> > issue.
> > > > Any experience would be greatly helpful.
> > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <[email protected]>
> > wrote:
> > > > > Hi Vinoth - thanks for the quick response.
> > > > >
> > > > > I have followed the mail thread for deletes ->
> > > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > > > [email protected]>
> > > > >
> > > > > For your convenience, the code that I use is below at the end of
> the
> > > > email. EmptyHoodieRecord is inserted for the relevant records that
> > need to
> > > > be deleted. After the delete, I can query from Hive and confirm that
> > the
> > > > rows intended to be deleted are no longer present and the records not
> > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > The issue starts when the upsert is done after a delete.
> > > > > The storage type is S3 and I dont think there is any eventual
> > > >
> > > > consistency in play as the record upserted is visible but the old
> > records
> > > > that werent deleted are not visible.
> > > > > And for the sake of completion, my insert and upsert logic is based
> > out
> > > >
> > > > of the code below:
> > > >
> >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > Thanks
> > > > > Kabeer.
> > > > >
> > > > > > /**
> > > > > > * Empty payload used for deletions
> > > > > > */
> > > > > > public class EmptyHoodieRecordPayload implements
> > > > >
> > > >
> > > > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > > > > {
> > > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > > >
> > > >
> > > > orderingVal) { }
> > > > > > @Override
> > > > > > public EmptyHoodieRecordPayload
> preCombine(EmptyHoodieRecordPayload
> > > > >
> > > >
> > > > another) {
> > > > > > return another;
> > > > > > }
> > > > > > @Override
> > > > > > public Optional<IndexedRecord>
> > combineAndGetUpdateValue(IndexedRecord
> > > > >
> > > >
> > > > currentValue,
> > > > > > chema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > @Override
> > > > > > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > }
> > > > >
> > > > > ---------- Forwarded Message ---------
> > > > >
> > > > > From: Vinoth Chandar <[email protected]>
> > > > > Subject: Re: Upsert after Delete
> > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > To: [email protected]
> > > > >
> > > > > That’s interesting. Can you also share details on storage type and
> > how
> > > > you
> > > > > are issuing the deletes and also the table/view (ro, rt) that you
> are
> > > > > querying?
> > > > >
> > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <[email protected]
> >
> > > > wrote:
> > > > > > Hudi experts and Users,
> > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > thing
> > > > >
> > > >
> > > > that
> > > > > > I have bumped into and it is a shame that this has come up when
> > > > >
> > > >
> > > > someone in
> > > > > > the team tested this whilst I failed to run this test.
> > > > > > Use case:
> > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > >
> > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> and
> > it
> > > > is
> > > > > > visible via sql through Presto/Hive.
> > > > > >
> > > > > > Upsert a new record into the same table (3, balaji). Query the
> > table
> > > > and
> > > > > > only record that is visible is: (3, balaji). The record (2,
> > vinoth) is
> > > > >
> > > >
> > > > not
> > > > > > displayed in the results.
> > > > > >
> > > > > > Any ideas on what could be at play here? Has someone done upsert
> > after
> > > > > > delete?
> > > > > >
> > > > > > Thanks,
> > > > > > Kabeer
> > > > > >
> > > > > > PS: Please note that upsert functionality is well tested and if
> we
> > do
> > > > (1,
> > > > > > vinoth) insert followed by upsert of (2, balaji) both the records
> > are
> > > > > > visible. So something else is at play and would appreciate any
> help
> > > > >
> > > >
> > > > that
> > > > > > you experts can provide insight.
> > > > >
> > > >
> > >
> > >
> >
>

Reply via email to