[jira] [Commented] (HDFS-9782) RollingFileSystemSink should have configurable roll interval

Daniel Templeton (JIRA) Wed, 24 Feb 2016 09:35:13 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163374#comment-15163374
 ]


Daniel Templeton commented on HDFS-9782:
----------------------------------------

[~rkanter], thank you for the review!

bq. If the idea here is to prevent attacking HDFS with everyone rolling at the 
same time, I think the default value should not be 0. That basically negates 
the what we're trying to do here.

In most clusters, this is not needed.  It's only the large (1000-ish node) 
clusters that will need to worry about staggering the rolls.  And then how much 
staggering is required depends heavily on the cluster.  I think 0 is a 
reasonable default.

bq. I'm not sure we should try to conform to HDFS-9821 here at this point.

Perhaps I overstated things a little.  I was already allowing for 
user-specified units when HDFS-9821 was created.  I liked the way they proposed 
to do it better, so I changed my code to work that way instead.  I agree that 
at some point there may be some shared utils to parse the time, but I need to 
do it now regardless.

And I'm not worried about {{roll-offset-interval-millis}}.  I think that one 
should actually stay only in millis.

bq. On a slow system or with some other delay, this could easily cause the test 
to be flakey...

I see your point, but it would have to be an enormously overloaded system.  The 
thread will run at the top of the second, so it's scheduled to run in less than 
1000ms.  If it takes more than 500ms to do the flush, things are seriously 
FUBAR.  All it's doing is closing a file.  I kinda think it should fail at that 
point.

> RollingFileSystemSink should have configurable roll interval
> ------------------------------------------------------------
>
>                 Key: HDFS-9782
>                 URL: https://issues.apache.org/jira/browse/HDFS-9782
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>         Attachments: HDFS-9782.001.patch, HDFS-9782.002.patch, 
> HDFS-9782.003.patch
>
>
> Right now it defaults to rolling at the top of every hour.  Instead that 
> interval should be configurable.  The interval should also allow for some 
> play so that all hosts don't try to flush their files simultaneously.
> I'm filing this in HDFS because I suspect it will involve touching the HDFS 
> tests.  If it turns out not to, I'll move it into common instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9782) RollingFileSystemSink should have configurable roll interval

Reply via email to