[
https://issues.apache.org/jira/browse/SPARK-57150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57150:
-----------------------------------
Labels: pull-request-available (was: )
> AutoCDC SCD1 Out-of-order Event Convergence Tests
> -------------------------------------------------
>
> Key: SPARK-57150
> URL: https://issues.apache.org/jira/browse/SPARK-57150
> Project: Spark
> Issue Type: Sub-task
> Components: Declarative Pipelines
> Affects Versions: 4.3.0
> Reporter: Anish Mahto
> Priority: Major
> Labels: pull-request-available
>
> A key feature of SDP's AutoCDC implementation is that it supports reconciling
> out-of-order (by sequence) events. This support also adds significant
> complexity to the reconciliation logic as it requires cross-microbatch
> stateful tracking in the auxiliary table, and is prone to breaking as the
> implementation evolves over time.
> Introduce an A/B style test suite to execute the implementation on both a
> sequence-sorted single-microbatch event stream and the same events on a
> shuffled multi-microbatch event stream. If out-of-order processing is
> correct, then the SCD1 implementation should produce the same target tables
> for both runs.
> Data is randomly generated, but with a constant seed for reproducibility.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]