[
https://issues.apache.org/jira/browse/HBASE-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438863#comment-13438863
]
nkeywal commented on HBASE-6490:
--------------------------------
I don't think it's an issue to increase globally. I haven't yet looked at the
memstore flush, but I think it's gonna be the same or worse: we don't really
expecting a write to fail.
I need to check if it can be a fixed value or if we need to take into account
the replication factor or the number of machine...
> 'dfs.client.block.write.retries' value could be increased in HBase
> ------------------------------------------------------------------
>
> Key: HBASE-6490
> URL: https://issues.apache.org/jira/browse/HBASE-6490
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver
> Affects Versions: 0.96.0
> Environment: all
> Reporter: nkeywal
> Priority: Minor
>
> When allocating a new node during writing, hdfs tries
> 'dfs.client.block.write.retries' times (default 3) to write the block. When
> it fails, it goes back to the nanenode for a new list, and raises an error if
> the number of retries is reached. In HBase, if the error is while we're
> writing a hlog file, it will trigger a region server abort (as hbase does not
> trust the log anymore). For simple case (new, and as such empty log file),
> this seems to be ok, and we don't lose data. There could be some complex
> cases if the error occurs on a hlog file with already multiple blocks written.
> Logs lines are:
> "Exception in createBlockOutputStream", then "Abandoning block " followed by
> "Excluding datanode " for a retry.
> IOException: "Unable to create new block.", when the number of retries is
> reached.
> Probability of occurence seems quite low, (number of bad nodes / number of
> nodes)^(number of retries), and it implies that you have a region server
> without its datanode. But it's per new block.
> Increasing the default value of 'dfs.client.block.write.retries' could make
> sense to be better covered in chaotic conditions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira