[
https://issues.apache.org/jira/browse/HADOOP-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584353#action_12584353
]
stack commented on HADOOP-3113:
-------------------------------
> I can make the configurable to be per file, but maybe it makes more sense to
> make it applicable to the entire system.
When the setting is not per file:
1. HBase often is new-kid on the block and but one of the many users of an HDFS
install. HBase installers may find it difficult convincing HDFS admins they
need to make the change.
2. If per-file, HBase can manage the configuration. Otherwise, its a two-step
process. HBase installers may plain forget.
3. Minor-point: HBase needs append only for its Write-Ahead Log; nowhere else.
If its an 'entire system' setting, will it require an HDFS restart to take
effect?
Sounds like we should set the blocksize for our Write-Ahead Log to be a good
deal smaller than default.
I took a quick look at the patch.
The hadoop-default.xml entry description is all on one line. You might want to
break it up. Also, is there a downside to setting the dfs.datanode.skipTmpFile
flag (Reading the description, in my head I'm thinking there must be or why
even bother with this configuration?)
Otherwise, patch looks good to me.
Do you have a suggestion for a test I might run to exercise this new
functionality?
Thanks Dhruba
> Provide a configurable way for DFSOututStream.flush() to flush data to real
> block file on DataNode.
> ---------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3113
> URL: https://issues.apache.org/jira/browse/HADOOP-3113
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: noTmpFile.patch
>
>
> DFSOutputStream has a method called flush() that persists block locations on
> the namenode and sends all outstanding data to all datanodes in the pipeline.
> However, this data goes to the tmp file on the datanode(s). When the block is
> closed, the tmp files is renamed to be the real block file. If the
> datanode(s) dies before the block is compete, then entire block is lost. This
> behaviour wil be fixed in HADOOP-1700.
> However, in the short term, a configuration paramater can be used to allow
> datanodes to write to the real block file directly, thereby avoiding writing
> to the tmp file. This means that data that is flushed successfully by a
> client does not get lost even if the datanode(s) or client dies.
> The Namenode already has code to pick the largest replica (if multiple
> datanodes have different sizes of this block). Also, the namenode has code to
> not trigger replication request if the file is still being written to.
> The only caveat that I can think of is that the block report periodicity
> should be much much smaller that the lease timeout period. A block report
> adds the being-written-to blocks to the blocksMap thereby avoiding any
> cleanup that a lease expiry processing might have otherwise done.
> Not all requirements specified by HADOOP-1700 are supported by this approach,
> but it could still be helpful (in the short term) for a wide range of
> applications.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.