[jira] [Commented] (STORM-1971) HDFS Timed Synchronous Policy

Jakes (JIRA) Thu, 21 Jul 2016 15:23:52 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388527#comment-15388527
 ]


Jakes commented on STORM-1971:
------------------------------

 [~dossett]  When I read the code, I find that each tuple is written to HDFS 
one by one when it is processed even though filesystem sync is called at 
Tick_tuple_frequency. Shouldn't the write happen only for the tuplebatch at 
periodic intervals(may be sync interval) for minimizing the network cost and 
for higher throughput? Why is each tuple written one by one to hdfs? 
https://github.com/apache/storm/blob/f48d7941b10483e87a30b4849321c4dc0844a5a5/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L152
 
             

> HDFS Timed Synchronous Policy
> -----------------------------
>
>                 Key: STORM-1971
>                 URL: https://issues.apache.org/jira/browse/STORM-1971
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-hdfs
>    Affects Versions: 0.10.0, 1.0.0
>            Reporter: darion yaphet
>            Assignee: darion yaphet
>
> When the data need to be wrote to HDFS is not very large in quantity . We 
> need a timed synchronous policy to flush cached date into HDFS periodically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1971) HDFS Timed Synchronous Policy

Reply via email to