[
https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687286#comment-13687286
]
Jerry He commented on HBASE-8760:
---------------------------------
[~mbertozzi]
Yes, restore/clone snapshot works wells if the parent hfile is not deleted. I
am amazed to see the part of the code that figures out the Links and half
References works well. Below is some region server log dump.
If the parent hfile has been deleted before restore/clone, this is the error.
table1 is the original snapshot table. table1_clone is the clone_snapshot table.
--------------------------------------------------------------------------------------------------------
$ hadoop fs -lsr /hbase/.hbase-snapshot
/hbase/.hbase-snapshot/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot
/hbase/.hbase-snapshot/my_table1_snapshot/.snapshotinfo
/hbase/.hbase-snapshot/my_table1_snapshot/.tableinfo.0000000001
/hbase/.hbase-snapshot/my_table1_snapshot/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/.regioninfo
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/family1
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/family1/c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/.regioninfo
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/family1
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/family1/c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714
2013-06-14 22:40:03,065 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.
2013-06-14 22:40:03,065 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13f2c26b6106505 Attempting to transition node
3ab8becbaddb796fc8a036762dbd9493 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13f2c26b6106505 Successfully transitioned node
3ab8becbaddb796fc8a036762dbd9493 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Opening region: {NAME =>
'table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.', STARTKEY =>
'', ENDKEY => 'user1959958463', ENCODED => 3ab8becbaddb796fc8a036762dbd9493,}
2013-06-14 22:40:03,067 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Setting up tabledescriptor config now ...
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.
2013-06-14 22:40:03,069 INFO org.apache.hadoop.hbase.regionserver.Store: time
to purge deletes set to 0ms in store family1
2013-06-14 22:40:03,069 INFO org.apache.hadoop.hbase.regionserver.Store:
hbase.hstore.compaction.min = 3
2013-06-14 22:40:03,070 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile:
reference
'hdfs://hdtest009:9000/hbase/table1_clone/3ab8becbaddb796fc8a036762dbd9493/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714'
to region=3e96bb19fb20e4edd27949f894878714
hfile=table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14
2013-06-14 22:40:03,071 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile:
Store file
hdfs://hdtest009:9000/hbase/table1_clone/3ab8becbaddb796fc8a036762dbd9493/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714
is a bottom reference to
hdfs://hdtest009:9000/hbase/table1_clone/3e96bb19fb20e4edd27949f894878714/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14
2013-06-14 22:40:03,072 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of
region=table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493., starting
to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: Unable
to open link: org.apache.hadoop.hbase.io.HFileLink
locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:631)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:544)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4372)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4320)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:101)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)
Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open
link: org.apache.hadoop.hbase.io.HFileLink
locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
at
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:481)
at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3322)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:606)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:604)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
... 3 more
Caused by: java.io.FileNotFoundException: Unable to open link:
org.apache.hadoop.hbase.io.HFileLink
locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:375)
at
org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:97)
at
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:537)
at
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:639)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:457)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:452)
... 8 more
If the parent hfile is still present, everything works ok.
----------------------------------------------------------------------------
2013-06-18 14:58:50,026 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 1
region(s)
2013-06-18 14:58:50,026 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.
2013-06-18 14:58:50,031 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13f58635d150013 Attempting to transition node
6dab7a8a16b0e195785d52ad7b15bd09 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-18 14:58:50,033 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13f58635d150013 Successfully transitioned node
6dab7a8a16b0e195785d52ad7b15bd09 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-18 14:58:50,034 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Opening region: {NAME =>
'table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.', STARTKEY =>
'', ENDKEY => 'user1959958463', ENCODED => 6dab7a8a16b0e195785d52ad7b15bd09,}
2013-06-18 14:58:50,035 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Setting up tabledescriptor config now ...
2013-06-18 14:58:50,035 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.
2013-06-18 14:58:50,039 INFO org.apache.hadoop.hbase.regionserver.Store: time
to purge deletes set to 0ms in store family1
2013-06-18 14:58:50,039 INFO org.apache.hadoop.hbase.regionserver.Store:
hbase.hstore.compaction.min = 3
2013-06-18 14:58:50,045 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile:
reference
'hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3'
to region=352470e8ef4d15b034ab1165b07e35e3
hfile=table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24
2013-06-18 14:58:50,049 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile:
Store file
hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3
is a bottom reference to
hdfs://hdtest009:9000/hbase/table1_clone/352470e8ef4d15b034ab1165b07e35e3/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24
2013-06-18 14:58:50,062 DEBUG org.apache.hadoop.hbase.regionserver.Store:
loaded
hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3,
isReference=true, isBulkLoadResult=false, seqid=32549, majorCompaction=false
2013-06-18 14:58:50,064 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.; next
sequenceid=32550
> possible loss of data in snapshot taken after region split
> ----------------------------------------------------------
>
> Key: HBASE-8760
> URL: https://issues.apache.org/jira/browse/HBASE-8760
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 0.94.8
> Reporter: Jerry He
> Assignee: Jerry He
> Fix For: 0.94.8
>
> Attachments: HBase-8760-0.94.8.patch
>
>
> Right after a region split but before the daughter regions are compacted, we
> have two daughter regions containing Reference files to the parent hfiles.
> If we take snapshot right at the moment, the snapshot will succeed, but it
> will only contain the daughter Reference files. Since there is no hold on the
> parent hfiles, they will be deleted by the HFile Cleaner after they are no
> longer needed by the daughter regions soon after.
> A minimum we need to do is the keep these parent hfiles from being deleted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira