[jira] [Commented] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)

Dhruve Ashar (JIRA) Tue, 04 Oct 2016 15:01:25 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546792#comment-15546792
 ]


Dhruve Ashar commented on SPARK-17417:
--------------------------------------

[~srowen] AFAIU the checkpointing mechanism in spark core, the recovery of an 
RDD from a checkpoint is limited to an application attempt. Spark streaming 
mentions that it can recover metadata/rdd from checkpointed data across 
application attempts. Please correct me if I have missed something here. With 
this understanding it wouldn't be necessary to parse the code for the old 
format as the recovery would be done using the same spark jar which was used to 
launch it. 

Also why is it that we are not cleaning up the checkpointed directory on 
sc.close ?

> Fix # of partitions for RDD while checkpointing - Currently limited by 
> 10000(%05d)
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-17417
>                 URL: https://issues.apache.org/jira/browse/SPARK-17417
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Dhruve Ashar
>
> Spark currently assumes # of partitions to be less than 100000 and uses %05d 
> padding. 
> If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up 
> and fails. This is because of part-files are sorted and compared as strings. 
> This leads filename order to be part-10000, part-100000, ... instead of 
> part-10000, part-10001, ..., part-100000 and while reconstructing the 
> checkpointed RDD the job fails. 
> Possible solutions: 
> - Bump the padding to allow more partitions or
> - Sort the part files extracting a sub-portion as string and then verify the 
> RDD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)

Reply via email to