[
https://issues.apache.org/jira/browse/STORM-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388527#comment-15388527
]
Jakes commented on STORM-1971:
------------------------------
[~dossett] When I read the code, I find that each tuple is written to HDFS
one by one when it is processed even though filesystem sync is called at
Tick_tuple_frequency. Shouldn't the write happen only for the tuplebatch at
periodic intervals(may be sync interval) for minimizing the network cost and
for higher throughput? Why is each tuple written one by one to hdfs?
https://github.com/apache/storm/blob/f48d7941b10483e87a30b4849321c4dc0844a5a5/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L152
> HDFS Timed Synchronous Policy
> -----------------------------
>
> Key: STORM-1971
> URL: https://issues.apache.org/jira/browse/STORM-1971
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-hdfs
> Affects Versions: 0.10.0, 1.0.0
> Reporter: darion yaphet
> Assignee: darion yaphet
>
> When the data need to be wrote to HDFS is not very large in quantity . We
> need a timed synchronous policy to flush cached date into HDFS periodically.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)