[
https://issues.apache.org/jira/browse/HADOOP-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129690#comment-15129690
]
Andrew Wang commented on HADOOP-12759:
--------------------------------------
Hi Daniel, I'm coming into this fresh, so please excuse my comments as I get up
to speed on this. Overall looks good, only nitty stuff, then some questions:
* Not a fan of a test config key since it's exposed to end users, can we use a
static variable or a VisibleForTesting setter instead? I didn't see any related
tests in HDFS-9637. I'm hoping whatever test emerges does not involve
Thread.sleep, since I hate sleeping in unit tests.
* The probing logic, instead of trying creates until we find a free file,
should we list the directory once first? Or once after the first failed create,
then probe?
* Need {{<p/>}} tags to get line breaks in class javadoc.
Some high-level or commentary or nits:
* In the penultimate paragraph of the class javadoc, do you know why reads
fail? I'd believe {{close}} failing if the pipeline strength falls (HDFS-4504),
but reads failing after a successful close is surprising. This is generally
only an issue with small clusters.
* An aside comment, since HDFS always writes one block to the local DN, it can
lead to skew if there's only one or few writers. Just an FYI depending on your
usecase.
> RollingFileSystemSink should eagerly rotate directories
> -------------------------------------------------------
>
> Key: HADOOP-12759
> URL: https://issues.apache.org/jira/browse/HADOOP-12759
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 2.8.0
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Priority: Critical
> Attachments: YARN-4664.001.patch
>
>
> The RollingFileSystemSink only rolls over to a new directory if a new metrics
> record comes in. The issue is that HDFS does not update the file size until
> it's closed (HDFS-5478), and if no new metrics record comes in, then the file
> size will never be updated.
> This JIRA is to add a background thread to the sink that will eagerly close
> the file at the top of the hour.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)