Palash Chauhan created PHOENIX-7884:
---------------------------------------
Summary: cdcIndexUpdateLag is silent during idle / failure /
parent-replay and misattributed during ancestor replay
Key: PHOENIX-7884
URL: https://issues.apache.org/jira/browse/PHOENIX-7884
Project: Phoenix
Issue Type: Sub-task
Affects Versions: 5.3.1
Reporter: Palash Chauhan
Assignee: Palash Chauhan
Fix For: 5.4.0, 5.3.2
h4. Background
{{cdcIndexUpdateLag}} is the primary freshness signal for eventually consistent
secondary indexes. It is registered as a {{MetricHistogram}} in
{{MetricsIndexCDCConsumerSource}} and is intended to drive freshness SLOs per
data table per RegionServer.
h4. Problem
The metric is emitted at exactly two places — inside {{processCDCBatch}} (line
988) and {{processCDCBatchGenerated}} (line 1109) — both inside an {{if
(!batchMutations.isEmpty())}} block, immediately after a successful non-empty
batch. This produces three distinct bugs:
1. {*}Silent during idle / failure / startup{*}. No sample is emitted when:
* the data table is idle (the main loop sleeps on {{pollIntervalMs}} or
backoff),
* batches are repeatedly failing (the catch block only increments
{{{}cdcBatchFailureCount{}}}),
* the consumer is in {{{}startupDelayMs{}}}, {{waitForCDCStreamEntry()}}
retries, or {{checkTrackerStatus()}} retries.
{*}2. Silent during parent-region replay, exactly when freshness matters
most{*}. After a region split/merge, {{run()}} calls
{{replayAndCompleteParentRegions(...)}} before the main loop starts. During
this phase — which can take hours on busy tables — the region's own new writes
accumulate in its CDC partition and are not processed. The lag metric reports
nothing about that growing backlog. The 15 s {{parentProgressPauseMs}} sleeps
inside {{processPartitionToCompletion}} are also silent.
{*}3. Mis-attribution: parent-replay timestamps pollute the per-data-table
histogram{*}. The per-batch {{updateCdcLag}} calls fire from both {{run()}} and
{{{}processPartitionToCompletion{}}}. During parent replay,
{{newLastTimestamp}} is an ancestor partition's processed timestamp
(potentially hours/days old). Those samples are tagged with {{dataTableName}}
and mix into the same histogram that represents this region's own freshness,
blurring the SLO signal.
h4. Proposed fix
Decouple lag _measurement_ from batch completion, and use the consumer's own
empty-poll signal to distinguish "caught up" from "behind".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)