[
https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925552#comment-17925552
]
Terry Wang commented on SPARK-44240:
------------------------------------
We also has encountered same problem :(.
To fix it we may need add sort before GlobalLimitExec!
> Setting the topKSortFallbackThreshold value may lead to inaccurate results
> --------------------------------------------------------------------------
>
> Key: SPARK-44240
> URL: https://issues.apache.org/jira/browse/SPARK-44240
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0
> Reporter: dzcxzl
> Priority: Minor
> Attachments: topKSortFallbackThreshold.png,
> topKSortFallbackThresholdDesc.png
>
>
>
> {code:java}
> set spark.sql.execution.topKSortFallbackThreshold=10000;
> SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT
> 10000) a; {code}
>
> If GlobalLimitExec is not the final operator and has a sort operator, shuffle
> read does not guarantee the order, which leads to the limit read data that
> may be random.
> TakeOrderedAndProjectExec has ordering, so there is no such problem.
>
> !topKSortFallbackThreshold.png!
> {code:java}
> set spark.sql.execution.topKSortFallbackThreshold=10000;
> select min(id) from (select id from range(999999999) order by id desc limit
> 10000) a; {code}
> !topKSortFallbackThresholdDesc.png!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]