[
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060728#comment-17060728
]
Duo Zhang commented on HBASE-23995:
-----------------------------------
If the split is suceeded, you do not need to snapshot the parent region right?
So whether the compaction leads to a removal of the parent region does not
matter.
Or your title is misleading? You just say 'Snapshoting a splitting region', I
would say that in newer version of HBase this is impossible.
Mabye there are other problem which make the snapshot broken but at least, not
'Snapshoting a splitting region'.
Thanks.
> Snapshoting a splitting region results in corrupted snapshot
> ------------------------------------------------------------
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.0.2
> Reporter: Szabolcs Bukros
> Priority: Major
>
> The problem seems to originate from the fact that while the region split
> itself runs in a lock, the compactions following it run in separate threads.
> Alternatively the use of space quota policies can prevent compaction after a
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent
> region, but do not keep the references to the daughter regions, because they
> (splitA, splitB qualifiers) are stored separately in the meta table and do
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will
> find the parent region, realizes it is in split state, but because it can not
> find the daughter region references (haven't propagated) assumes parent could
> be cleaned up and deletes it. The archived region used in the snaphost only
> has back reference to the now also archived parent region and if the snapshot
> is deleted they both gets cleaned up. Unfortunately the daughter regions only
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell <<EOF
> create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots
> when a region is in split state, would make snapshot creation problematic.
> Forcing to run compaction as part of the split thread would make it rather
> slow. Propagating the daughter region references could prevent the deletion
> of the cloned parent region and the data would not be broken anymore but I'm
> not sure we have a logic in place that could pick up the pieces and finish
> the split process.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)