Hi Sivaprakash, Not an expert here either, but for your second question. Yes, I believe when writing delta to the table you must identify the actual delta yourself and only write the new/changed/removed records. I guess we could put a request in for hudi to take care of this, but two possible issues would be, hudi knowing which of the columns in the table are important for the diff or to consider all columns and that this may add significant overhead....
Thanks, On Thu, Jul 16, 2020, 10:01 Allen Underwood <[email protected]> wrote: > Hi Sivaprakash, > > So I'm by no means an expert on this, but I think you might find what > you're looking for here: > https://hudi.apache.org/docs/concepts.html > > I'm not sure I fully understand Step 2 you mentioned - I'm writing 50 > records out of which only 10 records have been changed - does that mean > that you updated 10 records from step 1? Or you're updating some of the > other 40 records from step 2? > > Either way I guess, the key is all deltas will be written...it's after > those records are written to disk that they are consolidated during the > COMPACTION phase. I *BELIEVE* this is how it works. > Take a look at COMPACTION under the timeline section here: > https://hudi.apache.org/docs/concepts.html#timeline > > Hope that helps a bit. > > Allen > > On Thu, Jul 16, 2020 at 7:23 AM Sivaprakash < > [email protected]> wrote: > >> This might be a basic question - I'm experimenting with Hudi (Pyspark). I >> have used Insert/Upsert options to write delta into my data lake. However, >> one is not clear to me >> >> Step 1:- I write 50 records >> Step 2:- Im writing 50 records out of which only *10 records have been >> changed* (I'm using upsert mode & tried with MERGE_ON_READ also >> COPY_ON_WRITE) >> Step 3: I was expecting only 10 records will be written but it writes >> whole >> 50 records is this a normal behaviour? Which means do I need to determine >> the delta myself and write them alone? >> >> Am I missing something? >> > > > -- > *Allen Underwood* > Principal Software Engineer > Broadcom | Symantec Enterprise Division > *Mobile*: 404.808.5926 >
