This is a COW table. So def different issues. If you/balaji can pull up the
JIRA, happy to take a stab at it

On Sat, Aug 31, 2019 at 10:02 AM Jaimin Shah <shahjaimin0...@gmail.com>
wrote:

> Hi,
>     I posted issue on slack channel. Balaji looked into in and verified the
> issue than set up the Jira task which I shared earlier. This issue occurred
> on MOR table when last step is deleting data. Query after delete was
> failing because HUDI was not able to read schema from last record as it was
> written by empty payload. I am not sure whether it's fixed or not now.
> Maybe Vinoth can confirm it.
>
> Thanks,
> Jaimin
>
> On Fri, 30 Aug 2019 at 22:30, Kabeer Ahmed <kab...@linuxmail.org> wrote:
>
> > Jaimin
> >
> > There could be many reasons why after delete you couldnt see any data or
> > query the data. If you tell me exact mechanism, I can help.
> > On Aug 29 2019, at 7:47 am, Jaimin Shah <shahjaimin0...@gmail.com>
> wrote:
> > > Hi
> > > I remember I was also facing some issues with deletes. Maybe both
> issues
> > > are related ? After deletes not able to query data. At that time
> > >
> >
> https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
> > > this issue was filled. Is this issue now resolved?
> > >
> > > Thanks,
> > > Jaimin
> > >
> > > On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org <vbal...@apache.org>
> > wrote:
> > > >
> > > > Hi Kabeer,
> > > > I have requested some information in the github ticket.
> > > > Balaji.V On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer
> Ahmed <
> > > > kab...@linuxmail.org> wrote:
> > > >
> > > > Thanks for the quick response Vinoth. That is what I would have
> thought
> > > > that there is nothing complex or different in upsert after a delete.
> > Yes, I
> > > > can reproduce the issue with simple example that I have written in
> the
> > > > email.
> > > >
> > > > I have dug into the issue in detail and it seems it is a bug. I have
> > filed
> > > > it at: https://github.com/apache/incubator-hudi/issues/859 (
> > > >
> >
> https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > ).
> > > > Let me know if more information is required.
> > > > Thank you,
> > > >
> > > > On Aug 23 2019, at 1:37 am, Vinoth Chandar <vin...@apache.org>
> wrote:
> > > > > yes. I was asking about the HUDI storage type..
> > > > >
> > > > > There is nothing complex about upsert() after delete(). It almost
> as
> > if a
> > > > > delete() for (2, vinoth) happened in between.
> > > > >
> > > > > Are you able to repro this literally with this tiny example with 3
> > > > records?
> > > > > Some things to check
> > > > >
> > > > > - This sequence would have created 3 commits. You can look at the
> > commit
> > > > > files and see if the number of record updated, inserted, deleted
> > match
> > > > > expectations.
> > > > > - if they do, then you can use spark.read.parquet(.). on the
> > individual
> > > > > parquet files and see what records they actually contain ..
> > > > >
> > > > > This should shed some light on the pattern of failure and when
> > exactly
> > > > (2,
> > > > > vinoth) disappeared.
> > > > >
> > > > > Alternatively, if you can give a small snippet that reproduces
> this,
> > we
> > > > can
> > > > > debug from there.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <kab...@linuxmail.org
> >
> > > > wrote:
> > > > > > And if you meant HUDI storage type, I have left it to default
> COW -
> > > > >
> > > >
> > > > Copy
> > > > > > On Write.
> > > > > >
> > > > > > If anyone has tried this please let me know if you have hit
> similar
> > > > issue.
> > > > > > Any experience would be greatly helpful.
> > > > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org>
> > > > >
> > > >
> > > > wrote:
> > > > > > > Hi Vinoth - thanks for the quick response.
> > > > > > >
> > > > > > > I have followed the mail thread for deletes ->
> > > > > >
> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/
> > <
> > > > > > 155556722511.2660.9583626796839453...@gitbox.apache.org>
> > > > > > >
> > > > > > > For your convenience, the code that I use is below at the end
> of
> > the
> > > > > > email. EmptyHoodieRecord is inserted for the relevant records
> that
> > > > >
> > > >
> > > > need to
> > > > > > be deleted. After the delete, I can query from Hive and confirm
> > that
> > > > >
> > > >
> > > > the
> > > > > > rows intended to be deleted are no longer present and the records
> > not
> > > > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > > > The issue starts when the upsert is done after a delete.
> > > > > > > The storage type is S3 and I dont think there is any eventual
> > > > > >
> > > > > >
> > > > > > consistency in play as the record upserted is visible but the old
> > > > records
> > > > > > that werent deleted are not visible.
> > > > > > > And for the sake of completion, my insert and upsert logic is
> > based
> > > > > >
> > > > >
> > > >
> > > > out
> > > > > >
> > > > > > of the code below:
> > > >
> >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > > Thanks
> > > > > > > Kabeer.
> > > > > > >
> > > > > > > > /**
> > > > > > > > * Empty payload used for deletions
> > > > > > > > */
> > > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > > > > > > {
> > > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> > Comparable
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > orderingVal) { }
> > > > > > > > @Override
> > > > > > > > public EmptyHoodieRecordPayload
> > preCombine(EmptyHoodieRecordPayload
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > another) {
> > > > > > > > return another;
> > > > > > > > }
> > > > > > > > @Override
> > > > > > > > public Optional<IndexedRecord>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > combineAndGetUpdateValue(IndexedRecord
> > > > > > >
> > > > > >
> > > > > > currentValue,
> > > > > > > > chema schema) {
> > > > > > > > return Optional.empty();
> > > > > > > > }
> > > > > > > > @Override
> > > > > > > > public Optional<IndexedRecord> getInsertValue(Schema schema)
> {
> > > > > > > > return Optional.empty();
> > > > > > > > }
> > > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > ---------- Forwarded Message ---------
> > > > > > > From: Vinoth Chandar <vin...@apache.org>
> > > > > > > Subject: Re: Upsert after Delete
> > > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > > To: dev@hudi.apache.org
> > > > > > >
> > > > > > > That’s interesting. Can you also share details on storage type
> > and
> > > > how
> > > > > > you
> > > > > > > are issuing the deletes and also the table/view (ro, rt) that
> > you are
> > > > > > > querying?
> > > > > > >
> > > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <
> > kab...@linuxmail.org>
> > > > > > wrote:
> > > > > > > > Hudi experts and Users,
> > > > > > > > Has anyone attempted an upsert after a delete? Here is a
> weird
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > thing
> > > > > > >
> > > > > >
> > > > > > that
> > > > > > > > I have bumped into and it is a shame that this has come up
> when
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > someone in
> > > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > > Use case:
> > > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > > >
> > > > > > > > Delete a record (1, kabeer). Data in the table is: (2,
> vinoth)
> > and
> > > > it
> > > > > > is
> > > > > > > > visible via sql through Presto/Hive.
> > > > > > > >
> > > > > > > > Upsert a new record into the same table (3, balaji). Query
> the
> > > > table
> > > > > > and
> > > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > vinoth) is
> > > > > > >
> > > > > >
> > > > > > not
> > > > > > > > displayed in the results.
> > > > > > > >
> > > > > > > > Any ideas on what could be at play here? Has someone done
> > upsert
> > > > after
> > > > > > > > delete?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Kabeer
> > > > > > > >
> > > > > > > > PS: Please note that upsert functionality is well tested and
> > if we
> > > > do
> > > > > > (1,
> > > > > > > > vinoth) insert followed by upsert of (2, balaji) both the
> > records
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > are
> > > > > > > > visible. So something else is at play and would appreciate
> any
> > help
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > that
> > > > > > > > you experts can provide insight.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>

Reply via email to