[
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067784#comment-17067784
]
Szabolcs Bukros commented on HBASE-23995:
-----------------------------------------
As Josh mentioned both Split and Snapshot uses PV2 so it should work. And since
in 2.2 it does work I started to check commits missing from the old branch.
HBASE-21375 looked promising, while it does not target this behavior it looked
like a general improvement on the locking logic. Quickly backported and
re-tested it, but unfortunately it does not solve the issue.
Now that I know what to look for I could find in the log the point where the
lock is passed from Split to Snapshot (hbase-master.log).
{code:java}
2020-03-26 14:32:31,945 INFO [PEWorker-8] procedure2.ProcedureExecutor:
Finished pid=28, state=SUCCESS; SplitTableRegionProcedure table=tab2,
parent=11544264d3485f5ff700562ca6b62acb, daughterA
=dcf89acf08c55f494fd93ceedd3f3445, daughterB=bf84f2e23131d9488d9c56117d374187
in 1.0010sec
2020-03-26 14:32:31,946 DEBUG [PEWorker-8] locking.LockProcedure: LOCKED
pid=30, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
tableName=tab2, type=EXCLUSIVE
2020-03-26 14:32:31,948 INFO [PEWorker-8] procedure2.TimeoutExecutorThread:
ADDED pid=30, state=WAITING_TIMEOUT, locked=true;
org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=ta
b2, type=EXCLUSIVE; timeout=600000, timestamp=1585233751948
2020-03-26 14:32:31,948 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000]
snapshot.SnapshotManager: Started snapshot: { ss=tabshot2 table=tab2 type=FLUSH
}
{code}
Curiously in the rs log I can see PostOpenDeployTasks and compactions starting
to run while SplitTableRegionProcedure has the lock
{code:java}
2020-03-26 14:32:31,918 INFO
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445]
regionserver.HRegionServer: Post open deploy tasks for
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:31,919 DEBUG
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445]
regionserver.CompactSplit: Small Compaction requested: system; Because: Opening
Region; compactionQueue=(longCompactions=0:shortCompactions=0), splitQueue=0
2020-03-26 14:32:31,921 DEBUG
[regionserver/c2504-node4:16020-longCompactions-1585218367783]
compactions.SortedCompactionPolicy: Selecting compaction from 1 store files, 0
compacting, 1 eligible, 100 blocking
2020-03-26 14:32:31,922 DEBUG
[regionserver/c2504-node4:16020-longCompactions-1585218367783]
regionserver.HStore: dcf89acf08c55f494fd93ceedd3f3445 - cf: Initiating minor
compaction (all files)
{code}
And it only finishes at around the same time snapshot is finishing:
{code:java}
2020-03-26 14:32:32,088 INFO
[regionserver/c2504-node4:16020-longCompactions-1585218367783]
regionserver.CompactSplit: Completed compaction
region=tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445., storeName=cf,
priority=99, startTime=1585233151918; duration=0sec
2020-03-26 14:32:32,091 DEBUG
[regionserver/c2504-node4:16020-longCompactions-1585218367783]
regionserver.CompactSplit: Status
compactionQueue=(longCompactions=0:shortCompactions=0),
splitQueue=0233150936.bf84f2e23131d9488d9c56117d374187.
2020-03-26 14:32:32,101 DEBUG
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
snapshot.FlushSnapshotSubprocedure: ... Flush Snapshotting region
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. completed.
2020-03-26 14:32:32,101 DEBUG
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
snapshot.FlushSnapshotSubprocedure: Closing snapshot operation on
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:32,102 DEBUG [member:
'c2504-node4.coelab.cloudera.com,16020,1585218362034'
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed
1/2 local region snapshots.
2020-03-26 14:32:32,102 DEBUG [member:
'c2504-node4.coelab.cloudera.com,16020,1585218362034'
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed
2/2 local region snapshots.
{code}
> Snapshoting a splitting region results in corrupted snapshot
> ------------------------------------------------------------
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.0.2
> Reporter: Szabolcs Bukros
> Priority: Major
>
> The problem seems to originate from the fact that while the region split
> itself runs in a lock, the compactions following it run in separate threads.
> Alternatively the use of space quota policies can prevent compaction after a
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent
> region, but do not keep the references to the daughter regions, because they
> (splitA, splitB qualifiers) are stored separately in the meta table and do
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will
> find the parent region, realizes it is in split state, but because it can not
> find the daughter region references (haven't propagated) assumes parent could
> be cleaned up and deletes it. The archived region used in the snaphost only
> has back reference to the now also archived parent region and if the snapshot
> is deleted they both gets cleaned up. Unfortunately the daughter regions only
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell <<EOF
> create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots
> when a region is in split state, would make snapshot creation problematic.
> Forcing to run compaction as part of the split thread would make it rather
> slow. Propagating the daughter region references could prevent the deletion
> of the cloned parent region and the data would not be broken anymore but I'm
> not sure we have a logic in place that could pick up the pieces and finish
> the split process.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)