Jaimin

There could be many reasons why after delete you couldnt see any data or query 
the data. If you tell me exact mechanism, I can help.
On Aug 29 2019, at 7:47 am, Jaimin Shah <shahjaimin0...@gmail.com> wrote:
> Hi
> I remember I was also facing some issues with deletes. Maybe both issues
> are related ? After deletes not able to query data. At that time
> https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
> this issue was filled. Is this issue now resolved?
>
> Thanks,
> Jaimin
>
> On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org <vbal...@apache.org> wrote:
> >
> > Hi Kabeer,
> > I have requested some information in the github ticket.
> > Balaji.V On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed <
> > kab...@linuxmail.org> wrote:
> >
> > Thanks for the quick response Vinoth. That is what I would have thought
> > that there is nothing complex or different in upsert after a delete. Yes, I
> > can reproduce the issue with simple example that I have written in the
> > email.
> >
> > I have dug into the issue in detail and it seems it is a bug. I have filed
> > it at: https://github.com/apache/incubator-hudi/issues/859 (
> > https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
> > Let me know if more information is required.
> > Thank you,
> >
> > On Aug 23 2019, at 1:37 am, Vinoth Chandar <vin...@apache.org> wrote:
> > > yes. I was asking about the HUDI storage type..
> > >
> > > There is nothing complex about upsert() after delete(). It almost as if a
> > > delete() for (2, vinoth) happened in between.
> > >
> > > Are you able to repro this literally with this tiny example with 3
> > records?
> > > Some things to check
> > >
> > > - This sequence would have created 3 commits. You can look at the commit
> > > files and see if the number of record updated, inserted, deleted match
> > > expectations.
> > > - if they do, then you can use spark.read.parquet(.). on the individual
> > > parquet files and see what records they actually contain ..
> > >
> > > This should shed some light on the pattern of failure and when exactly
> > (2,
> > > vinoth) disappeared.
> > >
> > > Alternatively, if you can give a small snippet that reproduces this, we
> > can
> > > debug from there.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <kab...@linuxmail.org>
> > wrote:
> > > > And if you meant HUDI storage type, I have left it to default COW -
> > >
> >
> > Copy
> > > > On Write.
> > > >
> > > > If anyone has tried this please let me know if you have hit similar
> > issue.
> > > > Any experience would be greatly helpful.
> > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org>
> > >
> >
> > wrote:
> > > > > Hi Vinoth - thanks for the quick response.
> > > > >
> > > > > I have followed the mail thread for deletes ->
> > > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > > > 155556722511.2660.9583626796839453...@gitbox.apache.org>
> > > > >
> > > > > For your convenience, the code that I use is below at the end of the
> > > > email. EmptyHoodieRecord is inserted for the relevant records that
> > >
> >
> > need to
> > > > be deleted. After the delete, I can query from Hive and confirm that
> > >
> >
> > the
> > > > rows intended to be deleted are no longer present and the records not
> > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > The issue starts when the upsert is done after a delete.
> > > > > The storage type is S3 and I dont think there is any eventual
> > > >
> > > >
> > > > consistency in play as the record upserted is visible but the old
> > records
> > > > that werent deleted are not visible.
> > > > > And for the sake of completion, my insert and upsert logic is based
> > > >
> > >
> >
> > out
> > > >
> > > > of the code below:
> > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > Thanks
> > > > > Kabeer.
> > > > >
> > > > > > /**
> > > > > > * Empty payload used for deletions
> > > > > > */
> > > > > > public class EmptyHoodieRecordPayload implements
> > > > >
> > > > >
> > > >
> > > > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > > > > {
> > > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > > >
> > > > >
> > > >
> > > > orderingVal) { }
> > > > > > @Override
> > > > > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> > > > >
> > > > >
> > > >
> > > > another) {
> > > > > > return another;
> > > > > > }
> > > > > > @Override
> > > > > > public Optional<IndexedRecord>
> > > > >
> > > >
> > >
> >
> > combineAndGetUpdateValue(IndexedRecord
> > > > >
> > > >
> > > > currentValue,
> > > > > > chema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > @Override
> > > > > > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > }
> > > > >
> > > > >
> > > > > ---------- Forwarded Message ---------
> > > > > From: Vinoth Chandar <vin...@apache.org>
> > > > > Subject: Re: Upsert after Delete
> > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > To: dev@hudi.apache.org
> > > > >
> > > > > That’s interesting. Can you also share details on storage type and
> > how
> > > > you
> > > > > are issuing the deletes and also the table/view (ro, rt) that you are
> > > > > querying?
> > > > >
> > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <kab...@linuxmail.org>
> > > > wrote:
> > > > > > Hudi experts and Users,
> > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > > > >
> > > >
> > >
> >
> > thing
> > > > >
> > > >
> > > > that
> > > > > > I have bumped into and it is a shame that this has come up when
> > > > >
> > > > >
> > > >
> > > > someone in
> > > > > > the team tested this whilst I failed to run this test.
> > > > > > Use case:
> > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > >
> > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and
> > it
> > > > is
> > > > > > visible via sql through Presto/Hive.
> > > > > >
> > > > > > Upsert a new record into the same table (3, balaji). Query the
> > table
> > > > and
> > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > >
> > > >
> > >
> >
> > vinoth) is
> > > > >
> > > >
> > > > not
> > > > > > displayed in the results.
> > > > > >
> > > > > > Any ideas on what could be at play here? Has someone done upsert
> > after
> > > > > > delete?
> > > > > >
> > > > > > Thanks,
> > > > > > Kabeer
> > > > > >
> > > > > > PS: Please note that upsert functionality is well tested and if we
> > do
> > > > (1,
> > > > > > vinoth) insert followed by upsert of (2, balaji) both the records
> > > > >
> > > >
> > >
> >
> > are
> > > > > > visible. So something else is at play and would appreciate any help
> > > > >
> > > > >
> > > >
> > > > that
> > > > > > you experts can provide insight.
> > > > >
> > > >
> > >
> >
>
>

Reply via email to