Dorjee Tsering created SPARK-54838:
--------------------------------------
Summary: Optimize spark partition size
Key: SPARK-54838
URL: https://issues.apache.org/jira/browse/SPARK-54838
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Dorjee Tsering
I am proposing to add a new function in Dataset class to fix small file
problem. We have noticed that if the source data our spark job reads have many
small files (KB size), it creates lot of partitions. This PR adds a new
function named optimizePartition which when used creates partitions of size
128MB if no desired partition's size passed. You can pass your own desired
partition size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]