[
https://issues.apache.org/jira/browse/HBASE-30241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Liu updated HBASE-30241:
-----------------------------
Status: Patch Available (was: In Progress)
> TestMobCloneSnapshotFromClientAfterSplittingRegion is flaky because a split
> may build whole-file HFileLinks instead of References
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-30241
> URL: https://issues.apache.org/jira/browse/HBASE-30241
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, test
> Reporter: Xiao Liu
> Assignee: Xiao Liu
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
>
>
> h2. Symptom
> {\{TestMobCloneSnapshotFromClientAfterSplittingRegion}} (and the non-MOB
> {\{TestCloneSnapshotFromClientAfterSplittingRegion}}, which share a base
> class) are flaky.
> {\{testCloneSnapshotAfterSplittingRegion}} fails intermittently with:
> {code}
> org.opentest4j.AssertionFailedError: expected: not <null>
> at
> ...CloneSnapshotFromClientAfterSplittingRegionTestBase.testCloneSnapshotAfterSplittingRegion:99
> {code}
> h2. Root cause
> {\{CloneSnapshotFromClientAfterSplittingRegionTestBase.splitRegion()}} splits
> on the second row,
> expecting the snapshot to contain Reference files so that the cloned table
> records the
> parent-to-daughter split lineage (SPLITA/SPLITB) in meta -- the behavior
> HBASE-29111 added the
> {\{assertNotNull(daughter)}} check for.
> However, since HBASE-26421, when a store file lies entirely on one side of
> the split point the
> split builds a whole-file HFileLink for the daughter instead of a Reference.
> The test data is
> generated from the MD5 of the wall-clock time
> (\{{SnapshotTestingUtils.loadData}}), so the region
> being split can end up with all of its store files on one side of the split
> row. When that
> happens the snapshot contains only HFileLinks and no Reference files.
> {\{RestoreSnapshotHelper}} only records the parent-to-children mapping when
> it restores a Reference
> file (\{{restoreReferenceFile}}); the HFileLink path does not. So in the
> all-HFileLink case the
> cloned table's split parent has no SPLITA/SPLITB columns,
> \{{MetaTableAccessor.getDaughterRegions}}
> returns \{{(null, null)}}, and the assertion fails. Because it depends on
> randomly generated keys,
> the failure is non-deterministic.
> This is purely a test-side issue. In the all-HFileLink case the daughters
> link directly to the
> snapshot files and do not depend on the cloned parent, so not recording
> SPLITA/SPLITB is correct
> and safe -- there is no data loss. The product behavior is fine; only the
> test's assumption that a
> split always yields References is wrong.
> h2. Fix
> Make the test deterministically produce a Reference file: in
> \{{splitRegion()}}, major-compact each
> region into a single store file spanning the whole region key range before
> splitting. The split
> row then always falls inside an existing file, which yields a Reference (and
> the SPLITA/SPLITB
> lineage the test asserts on). Automatic compaction is disabled on the region
> server in this test,
> so the region is compacted directly via \{{HRegion.compact(true)}} (a
> synchronous compaction), and
> auto-compaction stays disabled afterwards so the post-split reference files
> are not compacted away.
> h2. Additional coverage
> Making the existing test deterministic means it now only exercises the
> Reference path. To cover the
> complementary all-HFileLink path -- which was previously only hit by chance
> -- add
> {\{CloneSnapshotFromClientAfterSplittingRegionWithLinksTestBase}} (MOB and
> non-MOB variants). It
> writes two store files with disjoint key ranges and splits between them so
> each daughter gets a
> whole-file HFileLink, then asserts:
> * the daughters contain only HFileLinks (no Reference files);
> * the cloned table contains all rows;
> * the cloned split parent records no SPLITA/SPLITB daughters (the complement
> of HBASE-29111: with
> whole-file links there is no parent-to-daughter mapping to record);
> * the cloned table's data survives deletion of the source table and the
> snapshot followed by an
> HFile cleaner run, i.e. the HFileLink back-references keep the archived files
> alive.
> This change is test-only; there is no production code change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)