[ 
https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488330#comment-16488330
 ] 

Hyukjin Kwon commented on SPARK-24370:
--------------------------------------

Sounds more like a question though. Do you have any reproducer or steps to 
reproduce this? That should help other guys like me reproduce and debug the 
problem.

> spark checkpoint creates many 0 byte empty files(partitions)  in checkpoint 
> directory
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-24370
>                 URL: https://issues.apache.org/jira/browse/SPARK-24370
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 2.1.1
>            Reporter: Jami Malikzade
>            Priority: Major
>         Attachments: partitions.PNG
>
>
> We currently facing issue, that when we call checkpoint on dataframe, it 
> creates partitions in checkpoint dir, but some of them are empty. So we 
> having exceptions reading dataframe back.
> Do you have any idea how to avoid it?
> it creates 200 partitions.Some are empty. I used repartition(1) before 
> checkpoint. But it is not good wordaround. Do we have anyway , to populate 
> all partitions with data, or avoid empty files?
> Pasted snapshot.
> !image-2018-05-23-21-10-43-673.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to