[
https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367611#comment-16367611
]
stack commented on HBASE-20006:
-------------------------------
With patch in place, we make more progress. We do the below output:
2018-02-16 14:35:01,027 INFO [PEWorker-15]
procedure.MasterProcedureScheduler(571): pid=105,
state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure
table=testOnlineSnapshotAfterSplittingRegions-1518791689780,
parent=034c0b19e0cdb4c5788c2d4172fd16d9,
daughterA=b5355f606c3f6dae55367b082065b41c,
daughterB=094cf44c1d0b3a294f42d2017fd99907,
table=testOnlineSnapshotAfterSplittingRegions-1518791689780,
testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.
2018-02-16 14:35:01,027 INFO [PEWorker-15]
assignment.SplitTableRegionProcedure(439): Split of {ENCODED =>
034c0b19e0cdb4c5788c2d4172fd16d9, NAME =>
'testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.',
STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT
... but rather than failing we then move on to...
2018-02-16 14:35:01,031 INFO [PEWorker-15] procedure2.ProcedureExecutor(1249):
Finished pid=105, state=SUCCESS; SplitTableRegionProcedure
table=testOnlineSnapshotAfterSplittingRegions-1518791689780,
parent=034c0b19e0cdb4c5788c2d4172fd16d9,
daughterA=b5355f606c3f6dae55367b082065b41c,
daughterB=094cf44c1d0b3a294f42d2017fd99907 in 1.0180sec
... which is good in this case at least.
Now I'm on to a new failure type....
Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem
reading HFile Trailer from file
hdfs://localhost:55231/user/jenkins/test-data/fe7360bf-946e-44d4-8682-120eae0b7055/data/default/testOnlineSnapshotAfterSplittingRegions-1518791702651/1dda732469ff033fa21cc271586a80b5/cf/testOnlineSnapshotAfterSplittingRegions-1518791689780=034c0b19e0cdb4c5788c2d4172fd16d9-395104433d8d43e7b6710b6ec44d5b85.3cc16fba4ef7fb478d3eb1626a24a661
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579)
at
org.apache.hadoop.hbase.regionserver.StoreFileReader.<init>(StoreFileReader.java:104)
at
org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:108)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:267)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:352)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:460)
at
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:668)
at
org.apache.hadoop.hbase.regionserver.HStore.lambda$openStoreFiles$0(HStore.java:535)
... 6 more
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at
org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:532)
... 14 more
The file name is crazy.
> TestRestoreSnapshotFromClientWithRegionReplicas is flakey
> ---------------------------------------------------------
>
> Key: HBASE-20006
> URL: https://issues.apache.org/jira/browse/HBASE-20006
> Project: HBase
> Issue Type: Sub-task
> Reporter: stack
> Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-20006.branch-2.001.patch
>
>
> Failing 10% of the time. Interestingly, it is below that causes fail. We go
> to split but it is already split. We will then fail the split with an
> internal assert which messes up procedures; at a minimum we should just not
> split (this is in the prepare stage).
> {code}
> 2018-02-15 23:21:42,162 INFO [PEWorker-12]
> procedure.MasterProcedureScheduler(571): pid=105,
> state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838,
> parent=3f850cea7d71a7ebd019f2f009efca4d,
> daughterA=06b5e6366efbef155d70e56cfdf58dc9,
> daughterB=8c175de1b33765a5683ac1e502edb0bd,
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838,
> testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.
> 2018-02-15 23:21:42,162 INFO [PEWorker-12]
> assignment.SplitTableRegionProcedure(440): Split of {ENCODED =>
> 3f850cea7d71a7ebd019f2f009efca4d, NAME =>
> 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.',
> STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT
> 2018-02-15 23:21:42,163 ERROR [PEWorker-12]
> procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception:
> pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838,
> parent=3f850cea7d71a7ebd019f2f009efca4d,
> daughterA=06b5e6366efbef155d70e56cfdf58dc9,
> daughterB=8c175de1b33765a5683ac1e502edb0bd
> java.lang.AssertionError: split region should have an exception here
> at
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228)
> at
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)