Hi Vinod, Thanks for detailed explanation. I looked little detail and found that though documentation says default compaction policy would run every 10 delta commits, but in the code i see default is 1. I think default value of 1 is little overkill and also it will make MERGE ON READ work like COPY ON WRITE with compaction on every run. Should we increase the default value to more than 1?
https://github.com/apache/incubator-hudi/blob/f34de3fb2738c8c36c937eba8df2a6848fafa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100 https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599 On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar <[email protected]> wrote: > Hi, > > Sorry, this slipped through the cracks. By default, the compaction policy > would run every 10 delta commits or so. > > https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction > > > > >>but in addition to new log file, i also see that corresponding parquet > file is also rewritten. > did you also have inserts in the second delta commit? inserts go to a new > parquet file, while updates to go the log as of now.. > > >>My question is whether when we write update to MERGE ON READ table, > compaction is always called? > Hudi supports both sync/inline compaction which is called with every update > or async compaction where it happens parallely. > So this is dependent on deltastreamer or datasource. I see you are using > the spark datasource, we only support inline compaction on it. > That said, we are planning to add similar async compaction support for > structured streaming sink in 0.6.0. Are you interested in that? > > > On Wed, May 20, 2020 at 2:10 PM Sathyaprakash G <[email protected]> > wrote: > > > Adding link to the images > > > > https://pasteboard.co/J9iAB10.png > > https://pasteboard.co/J9iB1gZ.png > > https://pasteboard.co/J9iBgpx.png > > > > On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G < > [email protected]> > > wrote: > > > > > Hi, > > > > > > I created a Merge on Read table and then tried to update a record by > > > writing the updated record to hudi table base path. > > > > > > When I look at the affected partition year=2020/month=05, i was > expecting > > > just new log file with the updated record written but in addition to > new > > > log file, i also see that corresponding parquet file is also rewritten. > > > > > > I see that there is compaction request by looking at below file. > > > .hoodie/.aux/20200520195745.compaction.requested > > > > > > My question is whether when we write update to MERGE ON READ table, > > > compaction is always called? Or is there any setting that controls > > whether > > > to automatically call compaction on particular write. > > > > > > This is the code i ran and also i have attached few screenshots of the > > > written files > > > > https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68 > > > > > > > > > -- > > > With Regards, > > > Sathyaprakash G > > > > > > > > > -- > > With Regards, > > Sathyaprakash G > > > -- With Regards, Sathyaprakash G
