[jira] [Comment Edited] (HBASE-21751) WAL creation fails during region open may cause region assign forever fail

Bing Xiao (JIRA) Tue, 07 May 2019 05:31:59 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812954#comment-16812954
 ]


Bing Xiao edited comment on HBASE-21751 at 5/7/19 12:30 PM:
------------------------------------------------------------

{noformat}
I think the root problem here is that, we throw exception in a constructor, and 
do not do any cleanup work when there are exceptions in constructor. On master 
branch, IIRC, I introduced a init method for WAL, for creating the first 
writer. So maybe we could use the same pattern for branch-2.x, and when init 
method throws any exceptions, we should close the WAL to do cleanup work. And 
if close is also failed, then we abort the RS. What do you think{noformat}
[~Apache9] Follow your idea , I submit a patch when init method throws any 
exceptions, try to close the WAL to do cleanup work, If close failed, then 
abort the rs; How about this patch?And there are many ut run timeout, I don't 
know why, it seems no relate to the patch.

[~allan163] How about this patch?Is there any problem

Thanks


was (Author: luffy123):
{noformat}
I think the root problem here is that, we throw exception in a constructor, and 
do not do any cleanup work when there are exceptions in constructor. On master 
branch, IIRC, I introduced a init method for WAL, for creating the first 
writer. So maybe we could use the same pattern for branch-2.x, and when init 
method throws any exceptions, we should close the WAL to do cleanup work. And 
if close is also failed, then we abort the RS. What do you think{noformat}
[~Apache9] Follow your idea , I submit a patch when init method throws any 
exceptions, try to close the WAL to do cleanup work, If close failed, then 
abort the rs; How about this patch?And there are many ut run timeout, I don't 
know why, it seems no relate to the patch.

Thanks

> WAL creation fails during region open may cause region assign forever fail
> --------------------------------------------------------------------------
>
>                 Key: HBASE-21751
>                 URL: https://issues.apache.org/jira/browse/HBASE-21751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.2, 2.0.4
>            Reporter: Allan Yang
>            Assignee: Bing Xiao
>            Priority: Major
>             Fix For: 2.0.6, 2.1.5, 2.2.1
>
>         Attachments: HBASE-21751-branch-2.1-v1.patch, 
> HBASE-21751-branch-2.1-v2.patch, HBASE-21751.patch, HBASE-21751.v2.patch, 
> HBASE-21751v2.patch
>
>
> During the first region opens on the RS, WALFactory will create a WAL file, 
> but if the wal creation fails, in some cases, HDFS will leave a empty file in 
> the dir(e.g. disk full, file is created succesfully but block allocation 
> fails). We have a check in AbstractFSWAL that if WAL belong to the same 
> factory exists, then a error will be throw. Thus, the region can never be 
> open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] 
> handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory 
> hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-21751) WAL creation fails during region open may cause region assign forever fail

Reply via email to