[ https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488329#comment-16488329 ]
Hyukjin Kwon commented on SPARK-24370: -------------------------------------- (let's avoid setting critical usually reserved for a committer) > spark checkpoint creates many 0 byte empty files(partitions) in checkpoint > directory > ------------------------------------------------------------------------------------- > > Key: SPARK-24370 > URL: https://issues.apache.org/jira/browse/SPARK-24370 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 2.1.1 > Reporter: Jami Malikzade > Priority: Major > Attachments: partitions.PNG > > > We currently facing issue, that when we call checkpoint on dataframe, it > creates partitions in checkpoint dir, but some of them are empty. So we > having exceptions reading dataframe back. > Do you have any idea how to avoid it? > it creates 200 partitions.Some are empty. I used repartition(1) before > checkpoint. But it is not good wordaround. Do we have anyway , to populate > all partitions with data, or avoid empty files? > Pasted snapshot. > !image-2018-05-23-21-10-43-673.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org