[
https://issues.apache.org/jira/browse/HBASE-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589034#comment-13589034
]
Himanshu Vashishtha commented on HBASE-7937:
--------------------------------------------
Thanks for taking a look.
bq. + private int logRollRetryCount;
Yes, they are set in ctr; I will make them final, and make them default in case
the value is <=0.
bq. // there may be a case when fs has just become available; one can do one
more retry
I was considering the case when a NN HA recovers in b/w we failed while doing
an op, and checking via FSUtils#checkFSAvailable call. If that happens, it will
be in a state for eg: fs.rename() threw an exception, but fs is healthy... so
rethrow the exception to the caller. In actual, it should have done one more
retry.
I tried to cover that case with the fsOk variable. If you think this is not
needed, I will remove it.
bq. incrementing twice.
Sorry about that. I will fix this.
bq. Default pause time:
1 sec; as defined in HConstants#DEFAULT_HBASE_SERVER_PAUSE
bq. Are we holding up all writes when we are paused like this?
I don't think we are. We are in the retrying loop at two places here:
a) Creating a new log writer
b) Archiving old logs
As long as we haven't created a new writer, we don't change the old log writer.
So, we are still pointing to the old hlog.
Archiving old logs shouldn't be a blocking call. If it is, it is a bug.
bq. refactoring..
Will do.
TestHLogSplit passes on local. I didn't change the LogSplitter code. Tried to
keep its scope minimum.
> Retry log rolling to support HA NN scenario
> -------------------------------------------
>
> Key: HBASE-7937
> URL: https://issues.apache.org/jira/browse/HBASE-7937
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 0.94.5
> Reporter: Himanshu Vashishtha
> Assignee: Himanshu Vashishtha
> Fix For: 0.95.0
>
> Attachments: HBASE-7937-trunk.patch, HBASE-7937-v1.patch
>
>
> A failure in log rolling causes regionserver abort. In case of HA NN, it will
> be good if there is a retry mechanism to roll the logs.
> A corresponding jira for MemStore retries is HBASE-7507.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira