Dorjee Tsering created SPARK-54838:
--------------------------------------

             Summary: Optimize spark partition size
                 Key: SPARK-54838
                 URL: https://issues.apache.org/jira/browse/SPARK-54838
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 4.2.0
            Reporter: Dorjee Tsering


I am proposing to add a new function in Dataset class to fix small file 
problem. We have noticed that if the source data our spark job reads have many 
small files (KB size), it creates lot of partitions. This PR adds a new 
function named optimizePartition which when used creates partitions of size 
128MB if no desired partition's size passed. You can pass your own desired 
partition size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to