[ 
https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-44240:
---------------------------
    Description: 
 
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) 
a; {code}
 

If GlobalLimitExec is not the final operator and has a sort operator, shuffle 
read does not guarantee the order, which leads to the limit read data that may 
be random.

TakeOrderedAndProjectExec has ordering, so there is no such problem.

 

!topKSortFallbackThreshold.png!

 

 

  was:
 
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) 
a; {code}
 

If GlobalLimitExec is not the final operator and has a sort operator, shuffle 
read does not guarantee the order, which leads to the limit read data that may 
be random.

TakeOrderedAndProjectExec has ordering, so there is no such problem.

 

!topKSortFallbackThreshold.png!

 
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
select min(id) from (select  id  from range(999999999) order by id desc limit 
10000) a; {code}
!topKSortFallbackThresholdDesc.png!


> Setting the topKSortFallbackThreshold value may lead to inaccurate results
> --------------------------------------------------------------------------
>
>                 Key: SPARK-44240
>                 URL: https://issues.apache.org/jira/browse/SPARK-44240
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0
>            Reporter: dzcxzl
>            Priority: Minor
>         Attachments: topKSortFallbackThreshold.png, 
> topKSortFallbackThresholdDesc.png
>
>
>  
> {code:java}
> set spark.sql.execution.topKSortFallbackThreshold=10000;
> SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 
> 10000) a; {code}
>  
> If GlobalLimitExec is not the final operator and has a sort operator, shuffle 
> read does not guarantee the order, which leads to the limit read data that 
> may be random.
> TakeOrderedAndProjectExec has ordering, so there is no such problem.
>  
> !topKSortFallbackThreshold.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to