aIbrahiim commented on PR #37950: URL: https://github.com/apache/beam/pull/37950#issuecomment-4132399063
> Nice, the per-line split correctly removes the dependency on non-deterministic flush boundaries, and should indeed fix the flakes. > > One question: the original assertion also implicitly verified intra-transaction ordering (e.g., that the DELETE for SingerId 1 always followed the INSERT for SingerId 1+2 within the same commit). With `containsInAnyOrder` on individual lines, that ordering property is no longer asserted. > > Is that intentional, or would it be worth adding a lightweight ordering check per commit timestamp / transaction ID on top of this? @bvolpato Great point and you’re absolutely right. that relaxation is intentional for this test as the flake came from timer flush grouping, so I changed the assertion to validate stable semantics only (presence/count of records) and removed dependence on grouping order across flushes also this is right that this drops an implicit ordering signal. I dont want to lose that coverage entirely, so I can add a small follow up assertion that checks ordering using the existing sort key (commitTimestamp, transactionId) in a deterministic way, without relying on flush chunk boundaries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
