Hi,
    I posted issue on slack channel. Balaji looked into in and verified the
issue than set up the Jira task which I shared earlier. This issue occurred
on MOR table when last step is deleting data. Query after delete was
failing because HUDI was not able to read schema from last record as it was
written by empty payload. I am not sure whether it's fixed or not now.
Maybe Vinoth can confirm it.

Thanks,
Jaimin

On Fri, 30 Aug 2019 at 22:30, Kabeer Ahmed <kab...@linuxmail.org> wrote:

> Jaimin
>
> There could be many reasons why after delete you couldnt see any data or
> query the data. If you tell me exact mechanism, I can help.
> On Aug 29 2019, at 7:47 am, Jaimin Shah <shahjaimin0...@gmail.com> wrote:
> > Hi
> > I remember I was also facing some issues with deletes. Maybe both issues
> > are related ? After deletes not able to query data. At that time
> >
> https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
> > this issue was filled. Is this issue now resolved?
> >
> > Thanks,
> > Jaimin
> >
> > On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org <vbal...@apache.org>
> wrote:
> > >
> > > Hi Kabeer,
> > > I have requested some information in the github ticket.
> > > Balaji.V On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed <
> > > kab...@linuxmail.org> wrote:
> > >
> > > Thanks for the quick response Vinoth. That is what I would have thought
> > > that there is nothing complex or different in upsert after a delete.
> Yes, I
> > > can reproduce the issue with simple example that I have written in the
> > > email.
> > >
> > > I have dug into the issue in detail and it seems it is a bug. I have
> filed
> > > it at: https://github.com/apache/incubator-hudi/issues/859 (
> > >
> https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> ).
> > > Let me know if more information is required.
> > > Thank you,
> > >
> > > On Aug 23 2019, at 1:37 am, Vinoth Chandar <vin...@apache.org> wrote:
> > > > yes. I was asking about the HUDI storage type..
> > > >
> > > > There is nothing complex about upsert() after delete(). It almost as
> if a
> > > > delete() for (2, vinoth) happened in between.
> > > >
> > > > Are you able to repro this literally with this tiny example with 3
> > > records?
> > > > Some things to check
> > > >
> > > > - This sequence would have created 3 commits. You can look at the
> commit
> > > > files and see if the number of record updated, inserted, deleted
> match
> > > > expectations.
> > > > - if they do, then you can use spark.read.parquet(.). on the
> individual
> > > > parquet files and see what records they actually contain ..
> > > >
> > > > This should shed some light on the pattern of failure and when
> exactly
> > > (2,
> > > > vinoth) disappeared.
> > > >
> > > > Alternatively, if you can give a small snippet that reproduces this,
> we
> > > can
> > > > debug from there.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <kab...@linuxmail.org>
> > > wrote:
> > > > > And if you meant HUDI storage type, I have left it to default COW -
> > > >
> > >
> > > Copy
> > > > > On Write.
> > > > >
> > > > > If anyone has tried this please let me know if you have hit similar
> > > issue.
> > > > > Any experience would be greatly helpful.
> > > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org>
> > > >
> > >
> > > wrote:
> > > > > > Hi Vinoth - thanks for the quick response.
> > > > > >
> > > > > > I have followed the mail thread for deletes ->
> > > > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/
> <
> > > > > 155556722511.2660.9583626796839453...@gitbox.apache.org>
> > > > > >
> > > > > > For your convenience, the code that I use is below at the end of
> the
> > > > > email. EmptyHoodieRecord is inserted for the relevant records that
> > > >
> > >
> > > need to
> > > > > be deleted. After the delete, I can query from Hive and confirm
> that
> > > >
> > >
> > > the
> > > > > rows intended to be deleted are no longer present and the records
> not
> > > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > > The issue starts when the upsert is done after a delete.
> > > > > > The storage type is S3 and I dont think there is any eventual
> > > > >
> > > > >
> > > > > consistency in play as the record upserted is visible but the old
> > > records
> > > > > that werent deleted are not visible.
> > > > > > And for the sake of completion, my insert and upsert logic is
> based
> > > > >
> > > >
> > >
> > > out
> > > > >
> > > > > of the code below:
> > >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > > /**
> > > > > > > * Empty payload used for deletions
> > > > > > > */
> > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > >
> > > > > >
> > > > >
> > > > > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > > > > > {
> > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> Comparable
> > > > > >
> > > > > >
> > > > >
> > > > > orderingVal) { }
> > > > > > > @Override
> > > > > > > public EmptyHoodieRecordPayload
> preCombine(EmptyHoodieRecordPayload
> > > > > >
> > > > > >
> > > > >
> > > > > another) {
> > > > > > > return another;
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional<IndexedRecord>
> > > > > >
> > > > >
> > > >
> > >
> > > combineAndGetUpdateValue(IndexedRecord
> > > > > >
> > > > >
> > > > > currentValue,
> > > > > > > chema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > }
> > > > > >
> > > > > >
> > > > > > ---------- Forwarded Message ---------
> > > > > > From: Vinoth Chandar <vin...@apache.org>
> > > > > > Subject: Re: Upsert after Delete
> > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > To: dev@hudi.apache.org
> > > > > >
> > > > > > That’s interesting. Can you also share details on storage type
> and
> > > how
> > > > > you
> > > > > > are issuing the deletes and also the table/view (ro, rt) that
> you are
> > > > > > querying?
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <
> kab...@linuxmail.org>
> > > > > wrote:
> > > > > > > Hudi experts and Users,
> > > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > > > > >
> > > > >
> > > >
> > >
> > > thing
> > > > > >
> > > > >
> > > > > that
> > > > > > > I have bumped into and it is a shame that this has come up when
> > > > > >
> > > > > >
> > > > >
> > > > > someone in
> > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > Use case:
> > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > >
> > > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> and
> > > it
> > > > > is
> > > > > > > visible via sql through Presto/Hive.
> > > > > > >
> > > > > > > Upsert a new record into the same table (3, balaji). Query the
> > > table
> > > > > and
> > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > >
> > > > >
> > > >
> > >
> > > vinoth) is
> > > > > >
> > > > >
> > > > > not
> > > > > > > displayed in the results.
> > > > > > >
> > > > > > > Any ideas on what could be at play here? Has someone done
> upsert
> > > after
> > > > > > > delete?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kabeer
> > > > > > >
> > > > > > > PS: Please note that upsert functionality is well tested and
> if we
> > > do
> > > > > (1,
> > > > > > > vinoth) insert followed by upsert of (2, balaji) both the
> records
> > > > > >
> > > > >
> > > >
> > >
> > > are
> > > > > > > visible. So something else is at play and would appreciate any
> help
> > > > > >
> > > > > >
> > > > >
> > > > > that
> > > > > > > you experts can provide insight.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>

Reply via email to