Anish Mahto created SPARK-57150:
-----------------------------------

             Summary: AutoCDC SCD1 Out-of-order Event Convergence Tests
                 Key: SPARK-57150
                 URL: https://issues.apache.org/jira/browse/SPARK-57150
             Project: Spark
          Issue Type: Sub-task
          Components: Declarative Pipelines
    Affects Versions: 4.3.0
            Reporter: Anish Mahto


A key feature of SDP's AutoCDC implementation is that it supports reconciling 
out-of-order (by sequence) events. This support also adds significant 
complexity to the reconciliation logic as it requires cross-microbatch stateful 
tracking in the auxiliary table, and is prone to breaking as the implementation 
evolves over time.

Introduce an A/B style test suite to execute the implementation on both a 
sequence-sorted single-microbatch event stream and the same events on a 
shuffled multi-microbatch event stream. If out-of-order processing is correct, 
then the SCD1 implementation should produce the same target tables for both 
runs.

Data is randomly generated, but with a constant seed for reproducibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to