Xiao Liu created HBASE-30241:
--------------------------------
Summary: TestMobCloneSnapshotFromClientAfterSplittingRegion is
flaky because a split may build whole-file HFileLinks instead of References
Key: HBASE-30241
URL: https://issues.apache.org/jira/browse/HBASE-30241
Project: HBase
Issue Type: Sub-task
Components: regionserver, test
Environment: h2. Symptom
{\{TestMobCloneSnapshotFromClientAfterSplittingRegion}} (and the non-MOB
{\{TestCloneSnapshotFromClientAfterSplittingRegion}}, which share a base class)
are flaky.
{\{testCloneSnapshotAfterSplittingRegion}} fails intermittently with:
{code}
org.opentest4j.AssertionFailedError: expected: not <null>
at
...CloneSnapshotFromClientAfterSplittingRegionTestBase.testCloneSnapshotAfterSplittingRegion:99
{code}
h2. Root cause
{\{CloneSnapshotFromClientAfterSplittingRegionTestBase.splitRegion()}} splits
on the second row,
expecting the snapshot to contain Reference files so that the cloned table
records the
parent-to-daughter split lineage (SPLITA/SPLITB) in meta -- the behavior
HBASE-29111 added the
{\{assertNotNull(daughter)}} check for.
However, since HBASE-26421, when a store file lies entirely on one side of the
split point the
split builds a whole-file HFileLink for the daughter instead of a Reference.
The test data is
generated from the MD5 of the wall-clock time
(\{{SnapshotTestingUtils.loadData}}), so the region
being split can end up with all of its store files on one side of the split
row. When that
happens the snapshot contains only HFileLinks and no Reference files.
{\{RestoreSnapshotHelper}} only records the parent-to-children mapping when it
restores a Reference
file (\{{restoreReferenceFile}}); the HFileLink path does not. So in the
all-HFileLink case the
cloned table's split parent has no SPLITA/SPLITB columns,
\{{MetaTableAccessor.getDaughterRegions}}
returns \{{(null, null)}}, and the assertion fails. Because it depends on
randomly generated keys,
the failure is non-deterministic.
This is purely a test-side issue. In the all-HFileLink case the daughters link
directly to the
snapshot files and do not depend on the cloned parent, so not recording
SPLITA/SPLITB is correct
and safe -- there is no data loss. The product behavior is fine; only the
test's assumption that a
split always yields References is wrong.
h2. Fix
Make the test deterministically produce a Reference file: in
\{{splitRegion()}}, major-compact each
region into a single store file spanning the whole region key range before
splitting. The split
row then always falls inside an existing file, which yields a Reference (and
the SPLITA/SPLITB
lineage the test asserts on). Automatic compaction is disabled on the region
server in this test,
so the region is compacted directly via \{{HRegion.compact(true)}} (a
synchronous compaction), and
auto-compaction stays disabled afterwards so the post-split reference files are
not compacted away.
h2. Additional coverage
Making the existing test deterministic means it now only exercises the
Reference path. To cover the
complementary all-HFileLink path -- which was previously only hit by chance --
add
{\{CloneSnapshotFromClientAfterSplittingRegionWithLinksTestBase}} (MOB and
non-MOB variants). It
writes two store files with disjoint key ranges and splits between them so each
daughter gets a
whole-file HFileLink, then asserts:
* the daughters contain only HFileLinks (no Reference files);
* the cloned table contains all rows;
* the cloned split parent records no SPLITA/SPLITB daughters (the complement of
HBASE-29111: with
whole-file links there is no parent-to-daughter mapping to record);
* the cloned table's data survives deletion of the source table and the
snapshot followed by an
HFile cleaner run, i.e. the HFileLink back-references keep the archived files
alive.
This change is test-only; there is no production code change.
Reporter: Xiao Liu
Assignee: Xiao Liu
Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
--
This message was sent by Atlassian Jira
(v8.20.10#820010)