[ https://issues.apache.org/jira/browse/STORM-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389537#comment-15389537 ]
Aaron Dossett commented on STORM-1971: -------------------------------------- Writing each tuple doesn't necessarily result in higher network costs since caching can happen in HDFS code. (I am not an HDFS expert, feel free to correct me if I'm wrong). I view the sync as an outer limit on when the data is guaranteed to have been flushed and persisted to HDFS. It is possible that an alternative approach (batching the writes) would result in better performance -- I'd be interested in seeing benchmarks for that. > HDFS Timed Synchronous Policy > ----------------------------- > > Key: STORM-1971 > URL: https://issues.apache.org/jira/browse/STORM-1971 > Project: Apache Storm > Issue Type: Bug > Components: storm-hdfs > Affects Versions: 0.10.0, 1.0.0 > Reporter: darion yaphet > Assignee: darion yaphet > > When the data need to be wrote to HDFS is not very large in quantity . We > need a timed synchronous policy to flush cached date into HDFS periodically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)