[
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584930#comment-13584930
]
Enis Soztutar commented on HBASE-7507:
--------------------------------------
bq. This is an important one for riding over ha nn topology changes (as per
Chunhui). Was seen on a cluster today.
As I reported in HBASE-7385, we've also seen this in NN HA tests.
bq. IMHO, this particular fix is only important if we have fixed all other
write attempts for HDFS.
We have seen some other edge case, where NN dies just before returning the RPC
response for create file, next retry from the DFS client fails due to file
already exists exception. I think I've logged it somewhere. Regardless, I
think, fixing the memstore flush is important, since it causes RS to abort on
fail.
Should we commit it, and if tests start failing, fix them later?
> Make memstore flush be able to retry after exception
> ----------------------------------------------------
>
> Key: HBASE-7507
> URL: https://issues.apache.org/jira/browse/HBASE-7507
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.3
> Reporter: chunhui shen
> Assignee: chunhui shen
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch,
> 7507-trunkv3.patch
>
>
> We will abort regionserver if memstore flush throws exception.
> I thinks we could do retry to make regionserver more stable because file
> system may be not ok in a transient time. e.g. Switching namenode in the
> NamenodeHA environment
> {code}
> HRegion#internalFlushcache(){
> ...
> try {
> ...
> }catch(Throwable t){
> DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
> Bytes.toStringBinary(getRegionName()));
> dse.initCause(t);
> throw dse;
> }
> ...
> }
> MemStoreFlusher#flushRegion(){
> ...
> region.flushcache();
> ...
> try {
> }catch(DroppedSnapshotException ex){
> server.abort("Replay of HLog required. Forcing server shutdown", ex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira