[
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588571#comment-16588571
]
Allan Yang commented on HBASE-18152:
------------------------------------
My test env reproduced this issue again with HBASE-20939. Maybe this issue is
not totally fixed by it.
{code}
2018-08-22 15:57:35,010 ERROR [Thread-14] procedure2.ProcedureExecutor(388):
Corrupt pid=16633, ppid=16510, state=WAITING_TIMEOUT:REGION_TRANSITION_QUEUE,
hasLock=false; AssignProcedure table=IntegrationTestBigLinkedList,
region=3407f9af348ff198a3bdd3a9ae7db02c
{code}
> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> ---------------------------------------------------------------------
>
> Key: HBASE-18152
> URL: https://issues.apache.org/jira/browse/HBASE-18152
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.0.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments:
> 0001-TestWALProcedureExecutore-order-checking-test-that-d.patch,
> HBASE-18152.master.001.patch,
> hbase-hbase-master-ctr-e138-1518143905142-221855-01-000002.hwx.site.log.gz,
> pv2-00000000000000000036.log, pv2-00000000000000000047.log,
> reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing. Its rare enough. Often we
> can get over it but sometimes we can't. It took me a while to capture an
> instance of corruption. Turns out we are write to the WAL out-of-order which
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we
> could write out of order. Will try and get more insight. Meantime parking
> this issue here to fill data into.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)