[jira] [Commented] (STORM-1971) HDFS Timed Synchronous Policy

Aaron Dossett (JIRA) Fri, 22 Jul 2016 06:57:37 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389537#comment-15389537
 ]


Aaron Dossett commented on STORM-1971:
--------------------------------------

Writing each tuple doesn't necessarily result in higher network costs since 
caching can happen in HDFS code.  (I am not an HDFS expert, feel free to 
correct me if I'm wrong).  I view the sync as an outer limit on when the data 
is guaranteed to have been flushed and persisted to HDFS.

It is possible that an alternative approach (batching the writes) would result 
in better performance -- I'd be interested in seeing benchmarks for that.

> HDFS Timed Synchronous Policy
> -----------------------------
>
>                 Key: STORM-1971
>                 URL: https://issues.apache.org/jira/browse/STORM-1971
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-hdfs
>    Affects Versions: 0.10.0, 1.0.0
>            Reporter: darion yaphet
>            Assignee: darion yaphet
>
> When the data need to be wrote to HDFS is not very large in quantity . We 
> need a timed synchronous policy to flush cached date into HDFS periodically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1971) HDFS Timed Synchronous Policy

Reply via email to