[ 
https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863685#comment-16863685
 ] 

Bing Xiao commented on HBASE-21751:
-----------------------------------

{quote}PI'd change this message on commit.. its a little confusing: 2135 
abort("may lead to meta region stuck in failed open state", ex);

Why is the added Exception Serializable? We do not usually do this (look 
around).

So now, we construct WAL and then have to call init on it. Does init need to be 
added to WAL interface or is it enough just being in abstract?

Just write out success rather than have it be succ.

In below finally, if an exception, we do not try to close the WAL. Should we?

158 } finally {
 159 if (!succ)
Unknown macro: \{ 160 tryUnknown macro}
catch (Throwable t)
 Unknown macro: \{ 163 throw new FailedCloseWALAfterInitializedErrorException( 
164 "Failed close after init wal failed.", t); 165 }
 166 }
 167 }

Thanks.
{quote}
[~stack] I think you are right, Serializable is no need to add and  abort("may 
lead to meta region stuck in failed open state", ex);  is confusing.I remove 
the Serializable and change abort message.

And init method for WAL is introduce on master branch, and I use the same way.

In finally we try to close WAL, if close failed then throw the 
FailedCloseWALAfterInitializedErrorException and abort rs, in order to avoid 
meta region stuck in failed open state。

Thanks.

 

> WAL creation fails during region open may cause region assign forever fail
> --------------------------------------------------------------------------
>
>                 Key: HBASE-21751
>                 URL: https://issues.apache.org/jira/browse/HBASE-21751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.2, 2.0.4
>            Reporter: Allan Yang
>            Assignee: Bing Xiao
>            Priority: Major
>             Fix For: 2.0.6, 2.2.1, 2.1.6
>
>         Attachments: HBASE-21751-branch-2.1-v1.patch, 
> HBASE-21751-branch-2.1-v2.patch, HBASE-21751-branch-2.1-v3.patch, 
> HBASE-21751.patch, HBASE-21751.v2.patch, HBASE-21751.v3.patch, 
> HBASE-21751v2.patch
>
>
> During the first region opens on the RS, WALFactory will create a WAL file, 
> but if the wal creation fails, in some cases, HDFS will leave a empty file in 
> the dir(e.g. disk full, file is created succesfully but block allocation 
> fails). We have a check in AbstractFSWAL that if WAL belong to the same 
> factory exists, then a error will be throw. Thus, the region can never be 
> open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] 
> handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory 
> hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to