[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157250#comment-13157250
 ] 

Ted Yu commented on HBASE-4862:
-------------------------------

Log snippets from Chunhui.
Region C was 3591e9867a4c125493dc82168854ea0c
File F was 0000000013156791680

Master log:
{code}
2011-11-16 11:47:23,134 INFO org.apache.hadoop.hbase.master.ServerManager:
  Triggering server recovery; existingServer serverB,60020,1321415172631 looks 
stale
  2011-11-16 11:47:23,134 DEBUG org.apache.hadoop.hbase.master.ServerManager:
  Added=serverB,60020,1321415172631 to dead servers, submitted shutdown handler 
to be executed, root=false, meta=true

  2011-11-16 11:47:29,305 INFO org.apache.hadoop.hbase.master.ServerManager:
  Triggering server recovery; existingServer serverA,60020,1321415179549 looks 
stale
  2011-11-16 11:47:29,305 DEBUG org.apache.hadoop.hbase.master.ServerManager:
  Added=serverA,60020,1321415179549 to dead servers, submitted shutdown handler 
to be executed, root=false, meta=false

  2011-11-16 11:48:28,700 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Splitting 28 hlog(s) in 
hdfs://serverX:9000/hbase-common/.logs/serverB,60020,1321414043798

  2011-11-16 11:48:30,657 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Creating writer 
path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103
 region=3591e9867a4c125493dc82168854ea0c

  2011-11-16 11:49:17,855 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Closed path 
hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103
 (wrote 75875 edits in 3228ms)

  2011-11-16 11:49:19,629 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Splitting 28 hlog(s) in 
hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134

  2011-11-16 11:49:20,650 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
  Creating writer 
path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
 region=3591e9867a4c125493dc82168854ea0c

  2011-11-16 11:49:36,731 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager:
  Assigning region 
writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.
 to serverD,60020,1321415224381

  2011-11-16 11:49:49,755 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.
 on serverD,60020,1321415224381

  2011-11-16 11:50:13,030 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
 File does not exist.

  2011-11-16 11:50:13,037 FATAL 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while 
writing log entry to log
  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
 File does not exist.

  2011-11-16 11:50:13,051 ERROR 
org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134
{code}
Log from region server D:
{code}
2011-11-16 11:49:36,730 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
region: 
writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.

2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion:
Failed delete of 
hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
 
2011-11-16 11:49:49,733 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined 
writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.;
 next sequenceid=13160672878
{code}

                
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
> trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to