[
https://issues.apache.org/jira/browse/HDFS-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163374#comment-15163374
]
Daniel Templeton commented on HDFS-9782:
----------------------------------------
[~rkanter], thank you for the review!
bq. If the idea here is to prevent attacking HDFS with everyone rolling at the
same time, I think the default value should not be 0. That basically negates
the what we're trying to do here.
In most clusters, this is not needed. It's only the large (1000-ish node)
clusters that will need to worry about staggering the rolls. And then how much
staggering is required depends heavily on the cluster. I think 0 is a
reasonable default.
bq. I'm not sure we should try to conform to HDFS-9821 here at this point.
Perhaps I overstated things a little. I was already allowing for
user-specified units when HDFS-9821 was created. I liked the way they proposed
to do it better, so I changed my code to work that way instead. I agree that
at some point there may be some shared utils to parse the time, but I need to
do it now regardless.
And I'm not worried about {{roll-offset-interval-millis}}. I think that one
should actually stay only in millis.
bq. On a slow system or with some other delay, this could easily cause the test
to be flakey...
I see your point, but it would have to be an enormously overloaded system. The
thread will run at the top of the second, so it's scheduled to run in less than
1000ms. If it takes more than 500ms to do the flush, things are seriously
FUBAR. All it's doing is closing a file. I kinda think it should fail at that
point.
> RollingFileSystemSink should have configurable roll interval
> ------------------------------------------------------------
>
> Key: HDFS-9782
> URL: https://issues.apache.org/jira/browse/HDFS-9782
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Attachments: HDFS-9782.001.patch, HDFS-9782.002.patch,
> HDFS-9782.003.patch
>
>
> Right now it defaults to rolling at the top of every hour. Instead that
> interval should be configurable. The interval should also allow for some
> play so that all hosts don't try to flush their files simultaneously.
> I'm filing this in HDFS because I suspect it will involve touching the HDFS
> tests. If it turns out not to, I'll move it into common instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)