Re: Spark runs into an Infinite loop even if the tasks are completed successfully

Mridul Muralidharan Fri, 14 Aug 2015 00:36:13 -0700

What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are used in spark [1].
They cannot be arbitrarily numbered,have duplicates, etc.



Extending RDD to add functionality is typically for niche cases; and
requires subclasses to adhere to the explicit (and implicit)
contracts/lifecycles for them.
Using existing RDD's as template would be a good idea for
customizations - one way to look at it is, using RDD is more in api
space but extending them is more in spi space.

Violations would actually not even be detectable by spark-core in general case.


Regards,
Mridul

[1] Ignoring the array out of bounds, etc - I am assuming the intent
is to show overlapping partitions, duplicates. index to partition
mismatch - that sort of thing.


On Thu, Aug 13, 2015 at 11:42 PM, Akhil Das <[email protected]> wrote:
> Yep, and it works fine for operations which does not involve any shuffle
> (like foreach,, count etc) and those which involves shuffle operations ends
> up in an infinite loop. Spark should somehow indicate this instead of going
> in an infinite loop.
>
> Thanks
> Best Regards
>
> On Thu, Aug 13, 2015 at 11:37 PM, Imran Rashid <[email protected]> wrote:
>>
>> oh I see, you are defining your own RDD & Partition types, and you had a
>> bug where partition.index did not line up with the partitions slot in
>> rdd.getPartitions.  Is that correct?
>>
>> On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das <[email protected]>
>> wrote:
>>>
>>> I figured that out, And these are my findings:
>>>
>>> -> It just enters in an infinite loop when there's a duplicate partition
>>> id.
>>>
>>> -> It enters in an infinite loop when the partition id starts from 1
>>> rather than 0
>>>
>>>
>>> Something like this piece of code can reproduce it: (in getPartitions())
>>>
>>> val total_partitions = 4
>>> val partitionsArray: Array[Partition] =
>>> Array.ofDim[Partition](total_partitions)
>>>
>>> var i = 0
>>>
>>> for(outer <- 0 to 1){
>>>   for(partition <- 1 to total_partitions){
>>>     partitionsArray(i) = new DeadLockPartitions(partition)
>>>     i = i + 1
>>>   }
>>> }
>>>
>>> partitionsArray
>>>
>>>
>>>
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid <[email protected]>
>>> wrote:
>>>>
>>>> yikes.
>>>>
>>>> Was this a one-time thing?  Or does it happen consistently?  can you
>>>> turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe
>>>> ...)
>>>>
>>>> On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das <[email protected]>
>>>> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> My Spark job (running in local[*] with spark 1.4.1) reads data from a
>>>>> thrift server(Created an RDD, it will compute the partitions in
>>>>> getPartitions() call and in computes hasNext will return records from 
>>>>> these
>>>>> partitions), count(), foreach() is working fine it returns the correct
>>>>> number of records. But whenever there is shuffleMap stage (like 
>>>>> reduceByKey
>>>>> etc.) then all the tasks are executing properly but it enters in an 
>>>>> infinite
>>>>> loop saying :
>>>>>
>>>>> 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1
>>>>> (map at FilterMain.scala:59) because some of its tasks had failed: 0, 3
>>>>>
>>>>>
>>>>> Here's the complete stack-trace http://pastebin.com/hyK7cG8S
>>>>>
>>>>> What could be the root cause of this problem? I looked up and bumped
>>>>> into this closed JIRA (which is very very old)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

Reply via email to