[jira] [Commented] (HDFS-9782) RollingFileSystemSink should have configurable roll interval

Andrew Wang (JIRA) Wed, 24 Feb 2016 16:32:33 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166420#comment-15166420
 ]


Andrew Wang commented on HDFS-9782:
-----------------------------------

bq. My concern is that the offset interval alters when the metrics are reliably 
available. I think it violates the principal of least astonishment to have the 
metrics randomly (literally) show up late by default. I would rather it not be 
on unless it's needed, and the user turns it on explicitly.

Is it that weird? You just need to poll {{offset}} after the flush. You also 
always need to be able to deal with late data, since the flush could pause or 
be delayed for other reasons too (e.g. GC pause).

I'm still not entirely clear on the requirements, since I can't think of other 
windowed metrics that we try to synchronize cluster wide. What kind of 
timeliness do we really require? Would it be acceptable if we did not 
synchronize rolling, but rolled more frequently?

bq. What's the alternative? I don't think millis is an acceptable unit for 
something that will likely be hours or days.

I did some JIRA searching, and found HADOOP-8608 which I didn't realize was 
available. Is this what we want?

> RollingFileSystemSink should have configurable roll interval
> ------------------------------------------------------------
>
>                 Key: HDFS-9782
>                 URL: https://issues.apache.org/jira/browse/HDFS-9782
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>         Attachments: HDFS-9782.001.patch, HDFS-9782.002.patch, 
> HDFS-9782.003.patch, HDFS-9782.004.patch
>
>
> Right now it defaults to rolling at the top of every hour.  Instead that 
> interval should be configurable.  The interval should also allow for some 
> play so that all hosts don't try to flush their files simultaneously.
> I'm filing this in HDFS because I suspect it will involve touching the HDFS 
> tests.  If it turns out not to, I'll move it into common instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9782) RollingFileSystemSink should have configurable roll interval

Reply via email to