This is a COW table. So def different issues. If you/balaji can pull up the JIRA, happy to take a stab at it
On Sat, Aug 31, 2019 at 10:02 AM Jaimin Shah <shahjaimin0...@gmail.com> wrote: > Hi, > I posted issue on slack channel. Balaji looked into in and verified the > issue than set up the Jira task which I shared earlier. This issue occurred > on MOR table when last step is deleting data. Query after delete was > failing because HUDI was not able to read schema from last record as it was > written by empty payload. I am not sure whether it's fixed or not now. > Maybe Vinoth can confirm it. > > Thanks, > Jaimin > > On Fri, 30 Aug 2019 at 22:30, Kabeer Ahmed <kab...@linuxmail.org> wrote: > > > Jaimin > > > > There could be many reasons why after delete you couldnt see any data or > > query the data. If you tell me exact mechanism, I can help. > > On Aug 29 2019, at 7:47 am, Jaimin Shah <shahjaimin0...@gmail.com> > wrote: > > > Hi > > > I remember I was also facing some issues with deletes. Maybe both > issues > > > are related ? After deletes not able to query data. At that time > > > > > > https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues > > > this issue was filled. Is this issue now resolved? > > > > > > Thanks, > > > Jaimin > > > > > > On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org <vbal...@apache.org> > > wrote: > > > > > > > > Hi Kabeer, > > > > I have requested some information in the github ticket. > > > > Balaji.V On Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer > Ahmed < > > > > kab...@linuxmail.org> wrote: > > > > > > > > Thanks for the quick response Vinoth. That is what I would have > thought > > > > that there is nothing complex or different in upsert after a delete. > > Yes, I > > > > can reproduce the issue with simple example that I have written in > the > > > > email. > > > > > > > > I have dug into the issue in detail and it seems it is a bug. I have > > filed > > > > it at: https://github.com/apache/incubator-hudi/issues/859 ( > > > > > > > https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > ). > > > > Let me know if more information is required. > > > > Thank you, > > > > > > > > On Aug 23 2019, at 1:37 am, Vinoth Chandar <vin...@apache.org> > wrote: > > > > > yes. I was asking about the HUDI storage type.. > > > > > > > > > > There is nothing complex about upsert() after delete(). It almost > as > > if a > > > > > delete() for (2, vinoth) happened in between. > > > > > > > > > > Are you able to repro this literally with this tiny example with 3 > > > > records? > > > > > Some things to check > > > > > > > > > > - This sequence would have created 3 commits. You can look at the > > commit > > > > > files and see if the number of record updated, inserted, deleted > > match > > > > > expectations. > > > > > - if they do, then you can use spark.read.parquet(.). on the > > individual > > > > > parquet files and see what records they actually contain .. > > > > > > > > > > This should shed some light on the pattern of failure and when > > exactly > > > > (2, > > > > > vinoth) disappeared. > > > > > > > > > > Alternatively, if you can give a small snippet that reproduces > this, > > we > > > > can > > > > > debug from there. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed <kab...@linuxmail.org > > > > > > wrote: > > > > > > And if you meant HUDI storage type, I have left it to default > COW - > > > > > > > > > > > > > Copy > > > > > > On Write. > > > > > > > > > > > > If anyone has tried this please let me know if you have hit > similar > > > > issue. > > > > > > Any experience would be greatly helpful. > > > > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed <kab...@linuxmail.org> > > > > > > > > > > > > > wrote: > > > > > > > Hi Vinoth - thanks for the quick response. > > > > > > > > > > > > > > I have followed the mail thread for deletes -> > > > > > > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/ > > < > > > > > > 155556722511.2660.9583626796839453...@gitbox.apache.org> > > > > > > > > > > > > > > For your convenience, the code that I use is below at the end > of > > the > > > > > > email. EmptyHoodieRecord is inserted for the relevant records > that > > > > > > > > > > > > > need to > > > > > > be deleted. After the delete, I can query from Hive and confirm > > that > > > > > > > > > > > > > the > > > > > > rows intended to be deleted are no longer present and the records > > not > > > > > > deleted can be seen in the Hive table via Hive and Presto. > > > > > > > The issue starts when the upsert is done after a delete. > > > > > > > The storage type is S3 and I dont think there is any eventual > > > > > > > > > > > > > > > > > > consistency in play as the record upserted is visible but the old > > > > records > > > > > > that werent deleted are not visible. > > > > > > > And for the sake of completion, my insert and upsert logic is > > based > > > > > > > > > > > > > > > > > > > out > > > > > > > > > > > > of the code below: > > > > > > > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43 > > > > > > > Thanks > > > > > > > Kabeer. > > > > > > > > > > > > > > > /** > > > > > > > > * Empty payload used for deletions > > > > > > > > */ > > > > > > > > public class EmptyHoodieRecordPayload implements > > > > > > > > > > > > > > > > > > > > > > > > > > HoodieRecordPayload<EmptyHoodieRecordPayload> > > > > > > > > { > > > > > > > > public EmptyHoodieRecordPayload(GenericRecord record, > > Comparable > > > > > > > > > > > > > > > > > > > > > > > > > > orderingVal) { } > > > > > > > > @Override > > > > > > > > public EmptyHoodieRecordPayload > > preCombine(EmptyHoodieRecordPayload > > > > > > > > > > > > > > > > > > > > > > > > > > another) { > > > > > > > > return another; > > > > > > > > } > > > > > > > > @Override > > > > > > > > public Optional<IndexedRecord> > > > > > > > > > > > > > > > > > > > > > > > > > > combineAndGetUpdateValue(IndexedRecord > > > > > > > > > > > > > > > > > > > currentValue, > > > > > > > > chema schema) { > > > > > > > > return Optional.empty(); > > > > > > > > } > > > > > > > > @Override > > > > > > > > public Optional<IndexedRecord> getInsertValue(Schema schema) > { > > > > > > > > return Optional.empty(); > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > ---------- Forwarded Message --------- > > > > > > > From: Vinoth Chandar <vin...@apache.org> > > > > > > > Subject: Re: Upsert after Delete > > > > > > > Date: Aug 22 2019, at 8:38 pm > > > > > > > To: dev@hudi.apache.org > > > > > > > > > > > > > > That’s interesting. Can you also share details on storage type > > and > > > > how > > > > > > you > > > > > > > are issuing the deletes and also the table/view (ro, rt) that > > you are > > > > > > > querying? > > > > > > > > > > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed < > > kab...@linuxmail.org> > > > > > > wrote: > > > > > > > > Hudi experts and Users, > > > > > > > > Has anyone attempted an upsert after a delete? Here is a > weird > > > > > > > > > > > > > > > > > > > > > > > > > > thing > > > > > > > > > > > > > > > > > > > that > > > > > > > > I have bumped into and it is a shame that this has come up > when > > > > > > > > > > > > > > > > > > > > > > > > > > someone in > > > > > > > > the team tested this whilst I failed to run this test. > > > > > > > > Use case: > > > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth) > > > > > > > > > > > > > > > > Delete a record (1, kabeer). Data in the table is: (2, > vinoth) > > and > > > > it > > > > > > is > > > > > > > > visible via sql through Presto/Hive. > > > > > > > > > > > > > > > > Upsert a new record into the same table (3, balaji). Query > the > > > > table > > > > > > and > > > > > > > > only record that is visible is: (3, balaji). The record (2, > > > > > > > > > > > > > > > > > > > > > > > > > > vinoth) is > > > > > > > > > > > > > > > > > > > not > > > > > > > > displayed in the results. > > > > > > > > > > > > > > > > Any ideas on what could be at play here? Has someone done > > upsert > > > > after > > > > > > > > delete? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Kabeer > > > > > > > > > > > > > > > > PS: Please note that upsert functionality is well tested and > > if we > > > > do > > > > > > (1, > > > > > > > > vinoth) insert followed by upsert of (2, balaji) both the > > records > > > > > > > > > > > > > > > > > > > > > > > > > > are > > > > > > > > visible. So something else is at play and would appreciate > any > > help > > > > > > > > > > > > > > > > > > > > > > > > > > that > > > > > > > > you experts can provide insight. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >