Hi,

Thanks for doing that! Let me take a look and also route the PR you
opened..

I suspect that we may be configuring to trigger it every commit, with the
policy actually decided to compact only as needed..
Nonetheless, it's a very valuable issue to bring up. Let's continue on the
PR.

Thanks
VInoth

On Sat, May 23, 2020 at 3:18 AM Sathyaprakash G <[email protected]>
wrote:

> Hi Vinod,
>
> Thanks for detailed explanation. I looked little detail and found that
> though documentation says default compaction policy would run every 10
> delta commits, but in the code i see default is 1. I think default value of
> 1 is little overkill and also it will make MERGE ON READ work like COPY ON
> WRITE with compaction on every run. Should we increase the default value to
> more than 1?
>
>
> https://github.com/apache/incubator-hudi/blob/f34de3fb2738c8c36c937eba8df2a6848fafa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100
>
>
> https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599
>
> On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar <[email protected]> wrote:
>
> > Hi,
> >
> > Sorry, this slipped through the cracks. By default, the compaction policy
> > would run every 10 delta commits or so.
> >
> >
> https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction
> >
> >
> >
> > >>but in addition to new log file, i also see that corresponding parquet
> > file is also rewritten.
> > did you also have inserts in the second delta commit?  inserts go to a
> new
> > parquet file, while updates to go the log as of now..
> >
> > >>My question is whether when we write update to MERGE ON READ table,
> > compaction is always called?
> > Hudi supports both sync/inline compaction which is called with every
> update
> > or async compaction where it happens parallely.
> > So this is dependent on deltastreamer or datasource. I see you are using
> > the spark datasource, we only support inline compaction on it.
> > That said, we are planning to add similar async compaction support for
> > structured streaming sink in 0.6.0. Are you interested in that?
> >
> >
> > On Wed, May 20, 2020 at 2:10 PM Sathyaprakash G <
> [email protected]>
> > wrote:
> >
> > > Adding link to the images
> > >
> > > https://pasteboard.co/J9iAB10.png
> > > https://pasteboard.co/J9iB1gZ.png
> > > https://pasteboard.co/J9iBgpx.png
> > >
> > > On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I created a Merge on Read table and then tried to update a record by
> > > > writing the updated record to hudi table base path.
> > > >
> > > > When I look at the affected partition year=2020/month=05, i was
> > expecting
> > > > just new log file with the updated record written but in addition to
> > new
> > > > log file, i also see that corresponding parquet file is also
> rewritten.
> > > >
> > > > I see that there is compaction request by looking at below file.
> > > > .hoodie/.aux/20200520195745.compaction.requested
> > > >
> > > > My question is whether when we write update to MERGE ON READ table,
> > > > compaction is always called? Or is there any setting that controls
> > > whether
> > > > to automatically call compaction on particular write.
> > > >
> > > > This is the code i ran and also i have attached few screenshots of
> the
> > > > written files
> > > >
> > https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68
> > > >
> > > >
> > > > --
> > > > With Regards,
> > > > Sathyaprakash G
> > > >
> > >
> > >
> > > --
> > > With Regards,
> > > Sathyaprakash G
> > >
> >
>
>
> --
> With Regards,
> Sathyaprakash G
>

Reply via email to