[
https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jami Malikzade updated SPARK-24370:
-----------------------------------
Attachment: partitions.PNG
> spark checkpoint creates many 0 byte empty files(partitions) in checkpoint
> directory
> -------------------------------------------------------------------------------------
>
> Key: SPARK-24370
> URL: https://issues.apache.org/jira/browse/SPARK-24370
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell
> Affects Versions: 2.1.1
> Reporter: Jami Malikzade
> Priority: Critical
> Attachments: partitions.PNG
>
>
> We currently facing issue, that when we call checkpoint on dataframe, it
> creates partitions in checkpoint dir, but some of them are empty. So we
> having exceptions reading dataframe back.
> Do you have any idea how to avoid it?
> it creates 200 partitions.Some are empty. I used repartition(1) before
> checkpoint. But it is not good wordaround. Do we have anyway , to populate
> all partitions with data, or avoid empty files?
> Pasted snapshot.
> !image-2018-05-23-21-10-43-673.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]