Jami Malikzade created SPARK-24370:
--------------------------------------
Summary: spark checkpoint creates many 0 byte empty
files(partitions) in checkpoint directory
Key: SPARK-24370
URL: https://issues.apache.org/jira/browse/SPARK-24370
Project: Spark
Issue Type: Bug
Components: Spark Shell
Affects Versions: 2.1.1
Reporter: Jami Malikzade
We currently facing issue, that when we call checkpoint on dataframe, it
creates partitions in checkpoint dir, but some of them are empty. So we having
exceptions reading dataframe back.
Do you have any idea how to avoid it?
it creates 200 partitions.Some are empty. I used repartition(1) before
checkpoint. But it is not good wordaround. Do we have anyway , to populate all
partitions with data, or avoid empty files?
Pasted snapshot.
!image-2018-05-23-21-10-43-673.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]