[ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052644#comment-16052644
 ] 

stack commented on HBASE-18152:
-------------------------------

Looks like we have a version of this problem in branch-1 too. This is from a 
[~tsuna] 1.3.1 log:

{code}
2017-06-09 01:03:34,499 ERROR [r12s3:9102.activeMasterManager] 
procedure2.ProcedureExecutor: corrupted procedure: 
Procedure=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure (id=96, 
owner=, state=RUNNABLE, startTime=6480hrs, 32mins, 51sec ago, 
lastUpdate=6480hrs, 32mins, 51sec ago)
2017-06-09 01:03:34,499 ERROR [r12s3:9102.activeMasterManager] 
procedure2.ProcedureExecutor: corrupted procedure: 
Procedure=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure (id=15, 
owner=, state=RUNNABLE, startTime=7032hrs, 28mins, 23sec ago, 
lastUpdate=7032hrs, 28mins, 23sec ago)
2017-06-09 01:03:34,499 ERROR [r12s3:9102.activeMasterManager] 
procedure2.ProcedureExecutor: corrupted procedure: 
Procedure=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure (id=77, 
owner=, state=RUNNABLE, startTime=7032hrs, 21mins, 11sec ago, 
lastUpdate=7032hrs, 21mins, 11sec ago)
{code}

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> ---------------------------------------------------------------------
>
>                 Key: HBASE-18152
>                 URL: https://issues.apache.org/jira/browse/HBASE-18152
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-18152.master.001.patch, 
> pv2-00000000000000000036.log, pv2-00000000000000000047.log, 
> reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to