[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

Szabolcs Bukros (Jira) Thu, 26 Mar 2020 08:41:01 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067784#comment-17067784
 ]


Szabolcs Bukros commented on HBASE-23995:
-----------------------------------------

As Josh mentioned both Split and Snapshot uses PV2 so it should work. And since 
in 2.2 it does work I started to check commits missing from the old branch. 
HBASE-21375 looked promising, while it does not target this behavior it looked 
like a general improvement on the locking logic. Quickly backported and 
re-tested it, but unfortunately it does not solve the issue.

Now that I know what to look for I could find in the log the point where the 
lock is passed from Split to Snapshot (hbase-master.log).

 
{code:java}
2020-03-26 14:32:31,945 INFO  [PEWorker-8] procedure2.ProcedureExecutor: 
Finished pid=28, state=SUCCESS; SplitTableRegionProcedure table=tab2, 
parent=11544264d3485f5ff700562ca6b62acb, daughterA
=dcf89acf08c55f494fd93ceedd3f3445, daughterB=bf84f2e23131d9488d9c56117d374187 
in 1.0010sec
2020-03-26 14:32:31,946 DEBUG [PEWorker-8] locking.LockProcedure: LOCKED 
pid=30, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, 
tableName=tab2, type=EXCLUSIVE
2020-03-26 14:32:31,948 INFO  [PEWorker-8] procedure2.TimeoutExecutorThread: 
ADDED pid=30, state=WAITING_TIMEOUT, locked=true; 
org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=ta
b2, type=EXCLUSIVE; timeout=600000, timestamp=1585233751948
2020-03-26 14:32:31,948 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] 
snapshot.SnapshotManager: Started snapshot: { ss=tabshot2 table=tab2 type=FLUSH 
}
{code}
Curiously in the rs log I can see PostOpenDeployTasks and compactions starting 
to run while SplitTableRegionProcedure has the lock
{code:java}
2020-03-26 14:32:31,918 INFO  
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] 
regionserver.HRegionServer: Post open deploy tasks for 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:31,919 DEBUG 
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] 
regionserver.CompactSplit: Small Compaction requested: system; Because: Opening 
Region; compactionQueue=(longCompactions=0:shortCompactions=0), splitQueue=0
2020-03-26 14:32:31,921 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
compactions.SortedCompactionPolicy: Selecting compaction from 1 store files, 0 
compacting, 1 eligible, 100 blocking
2020-03-26 14:32:31,922 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.HStore: dcf89acf08c55f494fd93ceedd3f3445 - cf: Initiating minor 
compaction (all files)
{code}
And it only finishes at around the same time snapshot is finishing:
{code:java}
  2020-03-26 14:32:32,088 INFO  
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.CompactSplit: Completed compaction 
region=tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445., storeName=cf, 
priority=99, startTime=1585233151918; duration=0sec
2020-03-26 14:32:32,091 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.CompactSplit: Status 
compactionQueue=(longCompactions=0:shortCompactions=0), 
splitQueue=0233150936.bf84f2e23131d9488d9c56117d374187.
2020-03-26 14:32:32,101 DEBUG 
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
 snapshot.FlushSnapshotSubprocedure: ... Flush Snapshotting region 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. completed.
2020-03-26 14:32:32,101 DEBUG 
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
 snapshot.FlushSnapshotSubprocedure: Closing snapshot operation on 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:32,102 DEBUG [member: 
'c2504-node4.coelab.cloudera.com,16020,1585218362034' 
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 
1/2 local region snapshots.
2020-03-26 14:32:32,102 DEBUG [member: 
'c2504-node4.coelab.cloudera.com,16020,1585218362034' 
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 
2/2 local region snapshots.
{code}
 

 

 

> Snapshoting a splitting region results in corrupted snapshot
> ------------------------------------------------------------
>
>                 Key: HBASE-23995
>                 URL: https://issues.apache.org/jira/browse/HBASE-23995
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 2.0.2
>            Reporter: Szabolcs Bukros
>            Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell <<EOF
> create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

Reply via email to