Hi
  I remember I was also facing some issues with deletes. Maybe both issues
are related ? After deletes not able to query data. At that time
https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
this issue was filled. Is this issue now resolved?

Thanks,
Jaimin

On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org <vbal...@apache.org> wrote:

>
> Hi Kabeer,
> I have requested some information in the github ticket.
> Balaji.V    On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed <
> kab...@linuxmail.org> wrote:
>
>  Thanks for the quick response Vinoth. That is what I would have thought
> that there is nothing complex or different in upsert after a delete. Yes, I
> can reproduce the issue with simple example that I have written in the
> email.
>
> I have dug into the issue in detail and it seems it is a bug. I have filed
> it at: https://github.com/apache/incubator-hudi/issues/859 (
> https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
> Let me know if more information is required.
> Thank you,
>
> On Aug 23 2019, at 1:37 am, Vinoth Chandar <vin...@apache.org> wrote:
> > yes. I was asking about the HUDI storage type..
> >
> > There is nothing complex about upsert() after delete(). It almost as if a
> > delete() for (2, vinoth) happened in between.
> >
> > Are you able to repro this literally with this tiny example with 3
> records?
> > Some things to check
> >
> > - This sequence would have created 3 commits. You can look at the commit
> > files and see if the number of record updated, inserted, deleted match
> > expectations.
> > - if they do, then you can use spark.read.parquet(.). on the individual
> > parquet files and see what records they actually contain ..
> >
> > This should shed some light on the pattern of failure and when exactly
> (2,
> > vinoth) disappeared.
> >
> > Alternatively, if you can give a small snippet that reproduces this, we
> can
> > debug from there.
> >
> >
> >
> >
> >
> >
> > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <kab...@linuxmail.org>
> wrote:
> > > And if you meant HUDI storage type, I have left it to default COW -
> Copy
> > > On Write.
> > >
> > > If anyone has tried this please let me know if you have hit similar
> issue.
> > > Any experience would be greatly helpful.
> > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org>
> wrote:
> > > > Hi Vinoth - thanks for the quick response.
> > > >
> > > > I have followed the mail thread for deletes ->
> > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > > 155556722511.2660.9583626796839453...@gitbox.apache.org>
> > > >
> > > > For your convenience, the code that I use is below at the end of the
> > > email. EmptyHoodieRecord is inserted for the relevant records that
> need to
> > > be deleted. After the delete, I can query from Hive and confirm that
> the
> > > rows intended to be deleted are no longer present and the records not
> > > deleted can be seen in the Hive table via Hive and Presto.
> > > > The issue starts when the upsert is done after a delete.
> > > > The storage type is S3 and I dont think there is any eventual
> > >
> > > consistency in play as the record upserted is visible but the old
> records
> > > that werent deleted are not visible.
> > > > And for the sake of completion, my insert and upsert logic is based
> out
> > >
> > > of the code below:
> > >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > > /**
> > > > > * Empty payload used for deletions
> > > > > */
> > > > > public class EmptyHoodieRecordPayload implements
> > > >
> > >
> > > HoodieRecordPayload<EmptyHoodieRecordPayload>
> > > > > {
> > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > >
> > >
> > > orderingVal) { }
> > > > > @Override
> > > > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> > > >
> > >
> > > another) {
> > > > > return another;
> > > > > }
> > > > > @Override
> > > > > public Optional<IndexedRecord>
> combineAndGetUpdateValue(IndexedRecord
> > > >
> > >
> > > currentValue,
> > > > > chema schema) {
> > > > > return Optional.empty();
> > > > > }
> > > > > @Override
> > > > > public Optional<IndexedRecord> getInsertValue(Schema schema) {
> > > > > return Optional.empty();
> > > > > }
> > > > > }
> > > >
> > > > ---------- Forwarded Message ---------
> > > >
> > > > From: Vinoth Chandar <vin...@apache.org>
> > > > Subject: Re: Upsert after Delete
> > > > Date: Aug 22 2019, at 8:38 pm
> > > > To: dev@hudi.apache.org
> > > >
> > > > That’s interesting. Can you also share details on storage type and
> how
> > > you
> > > > are issuing the deletes and also the table/view (ro, rt) that you are
> > > > querying?
> > > >
> > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <kab...@linuxmail.org>
> > > wrote:
> > > > > Hudi experts and Users,
> > > > > Has anyone attempted an upsert after a delete? Here is a weird
> thing
> > > >
> > >
> > > that
> > > > > I have bumped into and it is a shame that this has come up when
> > > >
> > >
> > > someone in
> > > > > the team tested this whilst I failed to run this test.
> > > > > Use case:
> > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > >
> > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and
> it
> > > is
> > > > > visible via sql through Presto/Hive.
> > > > >
> > > > > Upsert a new record into the same table (3, balaji). Query the
> table
> > > and
> > > > > only record that is visible is: (3, balaji). The record (2,
> vinoth) is
> > > >
> > >
> > > not
> > > > > displayed in the results.
> > > > >
> > > > > Any ideas on what could be at play here? Has someone done upsert
> after
> > > > > delete?
> > > > >
> > > > > Thanks,
> > > > > Kabeer
> > > > >
> > > > > PS: Please note that upsert functionality is well tested and if we
> do
> > > (1,
> > > > > vinoth) insert followed by upsert of (2, balaji) both the records
> are
> > > > > visible. So something else is at play and would appreciate any help
> > > >
> > >
> > > that
> > > > > you experts can provide insight.
> > > >
> > >
> >
> >
>

Reply via email to