[jira] [Created] (SPARK-44499) FileSourceScanExec OutputPartitioning for non bucketed scan

Tushar Mahale (Jira) Thu, 20 Jul 2023 06:43:36 -0700

Tushar Mahale created SPARK-44499:
-------------------------------------

             Summary: FileSourceScanExec OutputPartitioning for non bucketed 
scan
                 Key: SPARK-44499
                 URL: https://issues.apache.org/jira/browse/SPARK-44499
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.1
            Reporter: Tushar Mahale



FileSourceScanExec.outputPartitioning currently is calculated for bucketed scan 
only and for non-bucketed scan, we return UnknownPartitioning(0). This may 
result into unnecessary empty tasks creation, based on the SQLConf 
defaultParallelism setting even though the actual file may have very low number 
of partitions.

We need to also calculate and set the number of output partitions correctly for 
non-bucketed scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-44499) FileSourceScanExec OutputPartitioning for non bucketed scan

Reply via email to