[
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083832#comment-13083832
]
Todd Lipcon commented on HDFS-1108:
-----------------------------------
A few questions:
- should we syncLog() after every block? I took a look at the rpc metrics of a
~150 node cluster running HBase, and found that addBlock made up 3% of the
operations, and 20% of the write operations. The number of create() operations
and the number of addBlock() operations are very close to each other,
indicating that at least on this cluster, most files consist of only one block.
So, we could consider piggybacking the creation of the first block with the
create() call, and then this wouldn't be an additional fsync to the logs (and
would improve performance too)
- abandonBlock() should maybe call persistBlocks() too?
- should we document this new flag, or consider it an "internal" flag only used
to override for testing? If we determine that the overhead is small, maybe we
should just always have this behavior?
> ability to create a file whose newly allocated blocks are automatically
> persisted immediately
> ---------------------------------------------------------------------------------------------
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Reporter: dhruba borthakur
> Assignee: Dmytro Molkov
> Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not
> persisted in the NN transaction log when the block is allocated. Instead, a
> hflush() or a close() on the file persists the blocks into the transaction
> log. It would be nice if we can immediately persist newly allocated blocks
> (as soon as they are allocated) for specific files.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira