[ 
https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749574#comment-16749574
 ] 

Duo Zhang commented on HBASE-21751:
-----------------------------------

Is this very important to you [~allan163]? I need to revisit the code again.

IIRC, the assumption in the WAL implementation is that, if we fail to roll, 
then the RS will abort. So I'm not sure if there are other problems if we 
change the behavior.

> WAL creation fails during region open may cause region assign forever fail
> --------------------------------------------------------------------------
>
>                 Key: HBASE-21751
>                 URL: https://issues.apache.org/jira/browse/HBASE-21751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.2, 2.0.4
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 2.2.0, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21751.patch, HBASE-21751v2.patch
>
>
> During the first region opens on the RS, WALFactory will create a WAL file, 
> but if the wal creation fails, in some cases, HDFS will leave a empty file in 
> the dir(e.g. disk full, file is created succesfully but block allocation 
> fails). We have a check in AbstractFSWAL that if WAL belong to the same 
> factory exists, then a error will be throw. Thus, the region can never be 
> open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] 
> handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory 
> hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to