[
https://issues.apache.org/jira/browse/FLINK-38817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046487#comment-18046487
]
Bonnie Varghese commented on FLINK-38817:
-----------------------------------------
"With maxParallelism set to 1, the result will be correct even if a rescale
partitioner is used. In such cases (i.e. SortLimit (maxParallelism=1) → Sink
(maxParallelism=1)) a rescale partitioner behaves exactly the same as a forward
partitioner." Yes, I agree. But I think I meant even with parallelism set to 1
and maxParallelism as 500 (i.e. numSubpartitions)
> Out of order data seen while running tpc-ds queries
> ---------------------------------------------------
>
> Key: FLINK-38817
> URL: https://issues.apache.org/jira/browse/FLINK-38817
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 2.2.0
> Reporter: Bonnie Varghese
> Priority: Major
> Attachments: screenshot-1.png
>
>
> All unspecified edges are converted to Rescale edges by default for dynamic
> graphs. Related Jira - https://issues.apache.org/jira/browse/FLINK-25046
> While testing tpc-ds queries I observed that after a global operation the
> order of the global operation is not preserved due to Rescale edges.
> For SQL batch to work correctly, we should keep Forward edges after a global
> operation such as `SortLimit` or `Sort `to obtain data correctness and
> avoiding out of order data.
> I have put my observations and experiments in this doc here:
> [https://docs.google.com/document/d/1TTj2ddlQTfDgtGb0ISmiKWt6R9U4RxJ59o6bULC1YtI/edit?usp=sharing]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)