[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060728#comment-17060728
 ] 

Duo Zhang commented on HBASE-23995:
-----------------------------------

If the split is suceeded, you do not need to snapshot the parent region right? 
So whether the compaction leads to a removal of the parent region does not 
matter.

Or your title is misleading? You just say 'Snapshoting a splitting region', I 
would say that in newer version of HBase this is impossible.

Mabye there are other problem which make the snapshot broken but at least, not 
'Snapshoting a splitting region'.

Thanks.

> Snapshoting a splitting region results in corrupted snapshot
> ------------------------------------------------------------
>
>                 Key: HBASE-23995
>                 URL: https://issues.apache.org/jira/browse/HBASE-23995
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 2.0.2
>            Reporter: Szabolcs Bukros
>            Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell <<EOF
> create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to