[
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584902#comment-13584902
]
Himanshu Vashishtha commented on HBASE-7507:
--------------------------------------------
This patch looks safe to me (shouldn't introduce any flakiness as such). Ran it
on jenkins on the current 0.94 and it was green. Rather, I think instead of
re-trying the flush operation, why not just check whether the file system is
available or not in a re-trying mode? That should be more efficient. Or, yo
have considered that already?
The other possible candidates in a running cluster I can see are Compaction and
Log rolling. The former can be made to check the file system health in a
retrying manner (if people agree, I can upload a patch for that).
The log rolling looks a bit tricky because there are two idempotent operations
involved: Creating a new HLog writer, and closing the existing one. Having a
retrying loop for these (especially creating a new hlog file in the .logs
directory) doesn't look to be a good idea. I would avoid doing that.
Looking for more opinions?
> Make memstore flush be able to retry after exception
> ----------------------------------------------------
>
> Key: HBASE-7507
> URL: https://issues.apache.org/jira/browse/HBASE-7507
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.3
> Reporter: chunhui shen
> Assignee: chunhui shen
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch,
> 7507-trunkv3.patch
>
>
> We will abort regionserver if memstore flush throws exception.
> I thinks we could do retry to make regionserver more stable because file
> system may be not ok in a transient time. e.g. Switching namenode in the
> NamenodeHA environment
> {code}
> HRegion#internalFlushcache(){
> ...
> try {
> ...
> }catch(Throwable t){
> DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
> Bytes.toStringBinary(getRegionName()));
> dse.initCause(t);
> throw dse;
> }
> ...
> }
> MemStoreFlusher#flushRegion(){
> ...
> region.flushcache();
> ...
> try {
> }catch(DroppedSnapshotException ex){
> server.abort("Replay of HLog required. Forcing server shutdown", ex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira