[ 
https://issues.apache.org/jira/browse/HBASE-30241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Liu updated HBASE-30241:
-----------------------------
    Status: Patch Available  (was: In Progress)

> TestMobCloneSnapshotFromClientAfterSplittingRegion is flaky because a split 
> may build whole-file HFileLinks instead of References
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-30241
>                 URL: https://issues.apache.org/jira/browse/HBASE-30241
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, test
>            Reporter: Xiao Liu
>            Assignee: Xiao Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
>
>
> h2. Symptom
> {\{TestMobCloneSnapshotFromClientAfterSplittingRegion}} (and the non-MOB
> {\{TestCloneSnapshotFromClientAfterSplittingRegion}}, which share a base 
> class) are flaky.
> {\{testCloneSnapshotAfterSplittingRegion}} fails intermittently with:
> {code}
> org.opentest4j.AssertionFailedError: expected: not <null>
> at 
> ...CloneSnapshotFromClientAfterSplittingRegionTestBase.testCloneSnapshotAfterSplittingRegion:99
> {code}
> h2. Root cause
> {\{CloneSnapshotFromClientAfterSplittingRegionTestBase.splitRegion()}} splits 
> on the second row,
> expecting the snapshot to contain Reference files so that the cloned table 
> records the
> parent-to-daughter split lineage (SPLITA/SPLITB) in meta -- the behavior 
> HBASE-29111 added the
> {\{assertNotNull(daughter)}} check for.
> However, since HBASE-26421, when a store file lies entirely on one side of 
> the split point the
> split builds a whole-file HFileLink for the daughter instead of a Reference. 
> The test data is
> generated from the MD5 of the wall-clock time 
> (\{{SnapshotTestingUtils.loadData}}), so the region
> being split can end up with all of its store files on one side of the split 
> row. When that
> happens the snapshot contains only HFileLinks and no Reference files.
> {\{RestoreSnapshotHelper}} only records the parent-to-children mapping when 
> it restores a Reference
> file (\{{restoreReferenceFile}}); the HFileLink path does not. So in the 
> all-HFileLink case the
> cloned table's split parent has no SPLITA/SPLITB columns, 
> \{{MetaTableAccessor.getDaughterRegions}}
> returns \{{(null, null)}}, and the assertion fails. Because it depends on 
> randomly generated keys,
> the failure is non-deterministic.
> This is purely a test-side issue. In the all-HFileLink case the daughters 
> link directly to the
> snapshot files and do not depend on the cloned parent, so not recording 
> SPLITA/SPLITB is correct
> and safe -- there is no data loss. The product behavior is fine; only the 
> test's assumption that a
> split always yields References is wrong.
> h2. Fix
> Make the test deterministically produce a Reference file: in 
> \{{splitRegion()}}, major-compact each
> region into a single store file spanning the whole region key range before 
> splitting. The split
> row then always falls inside an existing file, which yields a Reference (and 
> the SPLITA/SPLITB
> lineage the test asserts on). Automatic compaction is disabled on the region 
> server in this test,
> so the region is compacted directly via \{{HRegion.compact(true)}} (a 
> synchronous compaction), and
> auto-compaction stays disabled afterwards so the post-split reference files 
> are not compacted away.
> h2. Additional coverage
> Making the existing test deterministic means it now only exercises the 
> Reference path. To cover the
> complementary all-HFileLink path -- which was previously only hit by chance 
> -- add
> {\{CloneSnapshotFromClientAfterSplittingRegionWithLinksTestBase}} (MOB and 
> non-MOB variants). It
> writes two store files with disjoint key ranges and splits between them so 
> each daughter gets a
> whole-file HFileLink, then asserts:
> * the daughters contain only HFileLinks (no Reference files);
> * the cloned table contains all rows;
> * the cloned split parent records no SPLITA/SPLITB daughters (the complement 
> of HBASE-29111: with
> whole-file links there is no parent-to-daughter mapping to record);
> * the cloned table's data survives deletion of the source table and the 
> snapshot followed by an
> HFile cleaner run, i.e. the HFileLink back-references keep the archived files 
> alive.
> This change is test-only; there is no production code change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to