Hi, Thanks for doing that! Let me take a look and also route the PR you opened..
I suspect that we may be configuring to trigger it every commit, with the policy actually decided to compact only as needed.. Nonetheless, it's a very valuable issue to bring up. Let's continue on the PR. Thanks VInoth On Sat, May 23, 2020 at 3:18 AM Sathyaprakash G <[email protected]> wrote: > Hi Vinod, > > Thanks for detailed explanation. I looked little detail and found that > though documentation says default compaction policy would run every 10 > delta commits, but in the code i see default is 1. I think default value of > 1 is little overkill and also it will make MERGE ON READ work like COPY ON > WRITE with compaction on every run. Should we increase the default value to > more than 1? > > > https://github.com/apache/incubator-hudi/blob/f34de3fb2738c8c36c937eba8df2a6848fafa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100 > > > https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599 > > On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar <[email protected]> wrote: > > > Hi, > > > > Sorry, this slipped through the cracks. By default, the compaction policy > > would run every 10 delta commits or so. > > > > > https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction > > > > > > > > >>but in addition to new log file, i also see that corresponding parquet > > file is also rewritten. > > did you also have inserts in the second delta commit? inserts go to a > new > > parquet file, while updates to go the log as of now.. > > > > >>My question is whether when we write update to MERGE ON READ table, > > compaction is always called? > > Hudi supports both sync/inline compaction which is called with every > update > > or async compaction where it happens parallely. > > So this is dependent on deltastreamer or datasource. I see you are using > > the spark datasource, we only support inline compaction on it. > > That said, we are planning to add similar async compaction support for > > structured streaming sink in 0.6.0. Are you interested in that? > > > > > > On Wed, May 20, 2020 at 2:10 PM Sathyaprakash G < > [email protected]> > > wrote: > > > > > Adding link to the images > > > > > > https://pasteboard.co/J9iAB10.png > > > https://pasteboard.co/J9iB1gZ.png > > > https://pasteboard.co/J9iBgpx.png > > > > > > On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G < > > [email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > I created a Merge on Read table and then tried to update a record by > > > > writing the updated record to hudi table base path. > > > > > > > > When I look at the affected partition year=2020/month=05, i was > > expecting > > > > just new log file with the updated record written but in addition to > > new > > > > log file, i also see that corresponding parquet file is also > rewritten. > > > > > > > > I see that there is compaction request by looking at below file. > > > > .hoodie/.aux/20200520195745.compaction.requested > > > > > > > > My question is whether when we write update to MERGE ON READ table, > > > > compaction is always called? Or is there any setting that controls > > > whether > > > > to automatically call compaction on particular write. > > > > > > > > This is the code i ran and also i have attached few screenshots of > the > > > > written files > > > > > > https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68 > > > > > > > > > > > > -- > > > > With Regards, > > > > Sathyaprakash G > > > > > > > > > > > > > -- > > > With Regards, > > > Sathyaprakash G > > > > > > > > -- > With Regards, > Sathyaprakash G >
