dzcxzl created SPARK-44240:
------------------------------
Summary: Setting the topKSortFallbackThreshold value may lead to
inaccurate results
Key: SPARK-44240
URL: https://issues.apache.org/jira/browse/SPARK-44240
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.4.0, 3.3.0, 3.2.0, 3.1.0, 3.0.0, 2.4.0
Reporter: dzcxzl
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000)
a; {code}
If GlobalLimitExec is not the final operator, shuffle read does not guarantee
the order, which leads to the limit read data that may be random.
TakeOrderedAndProjectExec has ordering, so there is no such problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]