[
https://issues.apache.org/jira/browse/HUDI-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937330#comment-16937330
]
BALAJI VARADARAJAN commented on HUDI-269:
-----------------------------------------
[~XingXPan] :
Thank you for sharing the S3 metrics
Can you confirm if all these requests is for writing to 1 table and no other
write happened on that bucket. The incremental timeline sync should see
benefits if you are running for several iterations. Wondering if this test was
done for only one iteration.
Regarding the embedded timeline-server only mode, you should see reductions
approximately in the order of (Number of Files updated)/(Number of partitions
touched)
How many partitions do the dataset have ?
If the number of partitions are large, cleaner operations could have produce
more directory listing calls when trying to find all partitions. Just for
testing this hypothesis, Can you try disabling cleaner for testing by setting
hoodie.clean.automatic=false
Thanks,
Balaji.V
> Provide ability to throttle DeltaStreamer sync runs
> ---------------------------------------------------
>
> Key: HUDI-269
> URL: https://issues.apache.org/jira/browse/HUDI-269
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: deltastreamer
> Reporter: BALAJI VARADARAJAN
> Assignee: Xing Pan
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: image-2019-09-25-08-51-19-686.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Copied from [https://github.com/apache/incubator-hudi/issues/922]
> In some scenario in our cluster, we may want delta streamer to slow down a
> bit.
> so it's nice to have a parameter to control the min sync interval of each
> sync in continuous mode.
> this param is default to 0, so this should not affect current logic.
> minor pr: [#921|https://github.com/apache/incubator-hudi/pull/921]
> the main reason we want to slow it down is that aws s3 is charged by s3
> get/put/list requests. we don't want to pay for too many requests for a
> really slow change table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)