[
https://issues.apache.org/jira/browse/HBASE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574593#comment-13574593
]
Jonathan Hsieh commented on HBASE-7764:
---------------------------------------
I'm fine with the patch but I really want an explanation of why split affects
snapshot/clone unit tests. The connection is non-obvious to me.
I believe the story roughly goes like this:
The online snapshotting (until we use table locks) can fail if splits/balances
occur while snapshotting. We've changed the split policy so the failure likely
has to do with split policies.
The fix changes to the ConstantSizeRegionSplitPolicy -- a policy that does not
split until the specified size is reached. In this case 10GB (or 64MB). These
tests write a few thousand rows which are for sake of argument roughly 100MB.
With this split policy, splitting doesn't occurs when snapshotting in the unit
test so snapshots succeed.
The as-of-0.94 splititng algo default is
IncreasingToUpperBoundRegionSplitPolicy. From its comments:
{code}
/**
* Split size is the number of regions that are on this server that all are
* of the same table, squared, times the region flush size OR the maximum
* region split size, whichever is smaller. For example, if the flush size
* is 128M, then on first flush we will split which will make two regions
* that will split when their size is 2 * 2 * 128M = 512M. If one of these
* regions splits, then there are three regions and now the split size is
* 3 * 3 * 128M = 1152M, and so on until we reach the configured
* maximum filesize and then from there on out, we'll use that.
*/
{code}
The key point is *on the first flush*. Since the online snapshot mechanism
forces a flush, it seems that from the description it would also trigger a
split. Since this is a cause of valid online snapshot failures, it is the
likely cause of these unit test failures.
Do you buy the story?
> [snapshot 130201 merge] Fix TestSnapshotCloneIndependence failure
> -----------------------------------------------------------------
>
> Key: HBASE-7764
> URL: https://issues.apache.org/jira/browse/HBASE-7764
> Project: HBase
> Issue Type: Sub-task
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 7764.txt, 7764-v2.txt, 7764-v3.txt
>
>
> Here is log snippet for
> TestSnapshotCloneIndependence#testOnlineSnapshotRegionOperationsIndependent
> (pay attention to region 1360020297284.2e43e47a882d3cff601eb222cad41f20.):
> {code}
> 2013-02-04 15:24:58,369 INFO
> [MASTER_SERVER_OPERATIONS-10.11.2.194,61955,1360020289835-0]
> handler.SplitRegionHandler(115): Handled SPLIT event;
> parent=test1360020295791,,1360020295793.794d37c0445b61619b5056623228827d.
> daughter
> a=test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.daughter
> b=test1360020295791,dgb,1360020297284.b87834cee60702d883aa287df6aaeaef.
> ...
> 2013-02-04 15:25:02,005 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask(78): Starting region
> operation on
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> 2013-02-04 15:25:02,005 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask(81): Flush Snapshotting
> region test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> started...
> 2013-02-04 15:25:02,005 DEBUG [member:
> '10.11.2.194,61958,1360020290064' subprocedure-pool3-thread-1]
> snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool(318): Completed
> 2/3 local region snapshots.
> 2013-02-04 15:25:02,005 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> regionserver.HRegion(2485): Storing region-info for snapshot.
> 2013-02-04 15:25:02,006 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> util.FSUtils(166): Creating
> file=hdfs://localhost:61942/user/tyu/hbase/.snapshot/.tmp/snapshot_test1360020295791/2e43e47a882d3cff601eb222cad41f20/.tmp/.regioninfo
> with permission=rwxrwxrwx
> 2013-02-04 15:25:02,014 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> regionserver.HRegion(2489): Creating references for hfiles
> 2013-02-04 15:25:02,014 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> regionserver.HRegion(2502): Adding snapshot references for
> [hdfs://localhost:61942/user/tyu/hbase/test1360020295791/2e43e47a882d3cff601eb222cad41f20/fam/946b75e5f0ba445aa1c646b8cbc87e02]
> hfiles
> 2013-02-04 15:25:02,015 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> regionserver.HRegion(2516): Creating reference for file (1/1) :
> hdfs://localhost:61942/user/tyu/hbase/test1360020295791/2e43e47a882d3cff601eb222cad41f20/fam/946b75e5f0ba445aa1c646b8cbc87e02
> 2013-02-04 15:25:02,017 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask(84): ... Flush
> Snapshotting region
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20. completed.
> 2013-02-04 15:25:02,017 DEBUG
> [rs(10.11.2.194,61958,1360020290064)-snapshot-pool5-thread-1]
> snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask(86): Closing region
> operation on
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> ...
> 2013-02-04 15:25:03,836 INFO [PRI IPC Server handler 7 on 61958]
> regionserver.HRegionServer(3600): Splitting
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> 2013-02-04 15:25:03,836 DEBUG [PRI IPC Server handler 7 on 61958]
> regionserver.HStore(1686): cannot split because midkey is the same as first
> or last row
> 2013-02-04 15:25:03,836 DEBUG [PRI IPC Server handler 7 on 61958]
> regionserver.CompactSplitThread(172): Region
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20. not
> splittable because midkey=null
> 2013-02-04 15:25:03,837 DEBUG [main] catalog.CatalogTracker(231): Stopping
> catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@5c602d9d
> 2013-02-04 15:25:03,838 INFO [main]
> client.TestSnapshotCloneIndependence(307): split requested for
> test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> 2013-02-04 15:25:03,892 DEBUG [main] client.MetaScanner(199): Scanning .META.
> starting at row=test1360020295791,,00000000000000 for max=2147483647 rows
> using hconnection 0x53d9f80
> 2013-02-04 15:25:03,895 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020295793.794d37c0445b61619b5056623228827d.',
> STARTKEY => '', ENDKEY => '', ENCODED =>
> 794d37c0445b61619b5056623228827d, OFFLINE => true, SPLIT => true,}
> 2013-02-04 15:25:03,896 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020295793.794d37c0445b61619b5056623228827d.',
> STARTKEY => '', ENDKEY => '', ENCODED =>
> 794d37c0445b61619b5056623228827d, OFFLINE => true, SPLIT => true,}
> 2013-02-04 15:25:03,896 DEBUG [main]
> client.MetaScanner$BlockingMetaScannerVisitor(441): blocking until region is
> in META: test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.
> 2013-02-04 15:25:03,897 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.',
> STARTKEY => '', ENDKEY => 'dgb', ENCODED =>
> 2e43e47a882d3cff601eb222cad41f20,}
> 2013-02-04 15:25:03,898 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.',
> STARTKEY => '', ENDKEY => 'dgb', ENCODED =>
> 2e43e47a882d3cff601eb222cad41f20,}
> 2013-02-04 15:25:03,898 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.',
> STARTKEY => '', ENDKEY => 'dgb', ENCODED =>
> 2e43e47a882d3cff601eb222cad41f20,}
> 2013-02-04 15:25:03,898 DEBUG [main] client.MetaScanner(252): Current INFO
> from scan results = {NAME =>
> 'test1360020295791,,1360020297284.2e43e47a882d3cff601eb222cad41f20.',
> STARTKEY => '', ENDKEY => 'dgb', ENCODED =>
> 2e43e47a882d3cff601eb222cad41f20,}
> {code}
> In the code we choose the first region from the original table to split which
> happens to be not splittable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira