Anish Mahto created SPARK-57150:
-----------------------------------
Summary: AutoCDC SCD1 Out-of-order Event Convergence Tests
Key: SPARK-57150
URL: https://issues.apache.org/jira/browse/SPARK-57150
Project: Spark
Issue Type: Sub-task
Components: Declarative Pipelines
Affects Versions: 4.3.0
Reporter: Anish Mahto
A key feature of SDP's AutoCDC implementation is that it supports reconciling
out-of-order (by sequence) events. This support also adds significant
complexity to the reconciliation logic as it requires cross-microbatch stateful
tracking in the auxiliary table, and is prone to breaking as the implementation
evolves over time.
Introduce an A/B style test suite to execute the implementation on both a
sequence-sorted single-microbatch event stream and the same events on a
shuffled multi-microbatch event stream. If out-of-order processing is correct,
then the SCD1 implementation should produce the same target tables for both
runs.
Data is randomly generated, but with a constant seed for reproducibility.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]