Tushar Mahale created SPARK-44499:
-------------------------------------
Summary: FileSourceScanExec OutputPartitioning for non bucketed
scan
Key: SPARK-44499
URL: https://issues.apache.org/jira/browse/SPARK-44499
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.4.1
Reporter: Tushar Mahale
FileSourceScanExec.outputPartitioning currently is calculated for bucketed scan
only and for non-bucketed scan, we return UnknownPartitioning(0). This may
result into unnecessary empty tasks creation, based on the SQLConf
defaultParallelism setting even though the actual file may have very low number
of partitions.
We need to also calculate and set the number of output partitions correctly for
non-bucketed scan.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]