thanks JaeBoo, in our case, the shuffle write are similar.

2015-01-21 17:01 GMT+08:00 JaeBoo Jung <>:

>  I was recently faced with a similar issue, but unfortunately I could
> not find out why it happened.
> Here's jira ticket of my
> previous post.
> Please check your shuffle I/O differences between the two in spark web UI
> because it can be possibly related to my case.
> Thanks
> Kevin
> ------- *Original Message* -------
> *Sender* : Fengyun RAO<>
> *Date* : 2015-01-21 17:41 (GMT+09:00)
> *Title* : Re: spark 1.2 three times slower than spark 1.1, why?
> maybe you mean different spark-submit script?
> we also use the same spark-submit script, thus the same memory, cores,
> etc configuration.
> ​
> 2015-01-21 15:45 GMT+08:00 Sean Owen <>:
>> I don't know of any reason to think the singleton pattern doesn't work or
>> works differently. I wonder if, for example, task scheduling is different
>> in 1.2 and you have more partitions across more workers and so are loading
>> more copies more slowly into your singletons.
>>  On Jan 21, 2015 7:13 AM, "Fengyun RAO" <> wrote:
>>>  the LogParser instance is not serializable, and thus cannot be a
>>> broadcast,
>>> what’s worse, it contains an LRU cache, which is essential to the
>>> performance, and we would like to share among all the tasks on the same
>>> node.
>>> If it is the case, what’s the recommended way to share a variable among
>>> all the tasks within the same executor.
>>> ​
>>> 2015-01-21 15:04 GMT+08:00 Davies Liu <>:
>>>> Maybe some change related to serialize the closure cause LogParser is
>>>> not a singleton any more, then it is initialized for every task.
>>>> Could you change it to a Broadcast?
>>>> On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <>
>>>> wrote:
>>>> > Currently we are migrating from spark 1.1 to spark 1.2, but found the
>>>> > program 3x slower, with nothing else changed.
>>>> > note: our program in spark 1.1 has successfully processed a whole
>>>> year data,
>>>> > quite stable.
>>>> >
>>>> > the main script is as below
>>>> >
>>>> > sc.textFile(inputPath)
>>>> > .flatMap(line => LogParser.parseLine(line))
>>>> > .groupByKey(new HashPartitioner(numPartitions))
>>>> > .mapPartitionsWithIndex(...)
>>>> > .foreach(_ => {})
>>>> >
>>>> > where LogParser is a singleton which may take some time to
>>>> initialized and
>>>> > is shared across the execuator.
>>>> >
>>>> > the flatMap stage is 3x slower.
>>>> >
>>>> > We tried to change spark.shuffle.manager back to hash, and
>>>> > spark.shuffle.blockTransferService back to nio, but didn’t help.
>>>> >
>>>> > May somebody explain possible causes, or what should we test or
>>>> change to
>>>> > find it out

Reply via email to