thanks JaeBoo, in our case, the shuffle write are similar.

2015-01-21 17:01 GMT+08:00 JaeBoo Jung <itsjb.j...@samsung.com>:

>  I was recently faced with a similar issue, but unfortunately I could
> not find out why it happened.
>
> Here's jira ticket https://issues.apache.org/jira/browse/SPARK-5081 of my
> previous post.
>
> Please check your shuffle I/O differences between the two in spark web UI
> because it can be possibly related to my case.
>
>
>
> Thanks
>
> Kevin
>
>
>
> ------- *Original Message* -------
>
> *Sender* : Fengyun RAO<raofeng...@gmail.com>
>
> *Date* : 2015-01-21 17:41 (GMT+09:00)
>
> *Title* : Re: spark 1.2 three times slower than spark 1.1, why?
>
>
>
> maybe you mean different spark-submit script?
>
> we also use the same spark-submit script, thus the same memory, cores,
> etc configuration.
> ​
>
> 2015-01-21 15:45 GMT+08:00 Sean Owen <so...@cloudera.com>:
>
>> I don't know of any reason to think the singleton pattern doesn't work or
>> works differently. I wonder if, for example, task scheduling is different
>> in 1.2 and you have more partitions across more workers and so are loading
>> more copies more slowly into your singletons.
>>  On Jan 21, 2015 7:13 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote:
>>
>>>  the LogParser instance is not serializable, and thus cannot be a
>>> broadcast,
>>>
>>> what’s worse, it contains an LRU cache, which is essential to the
>>> performance, and we would like to share among all the tasks on the same
>>> node.
>>>
>>> If it is the case, what’s the recommended way to share a variable among
>>> all the tasks within the same executor.
>>> ​
>>>
>>> 2015-01-21 15:04 GMT+08:00 Davies Liu <dav...@databricks.com>:
>>>
>>>> Maybe some change related to serialize the closure cause LogParser is
>>>> not a singleton any more, then it is initialized for every task.
>>>>
>>>> Could you change it to a Broadcast?
>>>>
>>>> On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com>
>>>> wrote:
>>>> > Currently we are migrating from spark 1.1 to spark 1.2, but found the
>>>> > program 3x slower, with nothing else changed.
>>>> > note: our program in spark 1.1 has successfully processed a whole
>>>> year data,
>>>> > quite stable.
>>>> >
>>>> > the main script is as below
>>>> >
>>>> > sc.textFile(inputPath)
>>>> > .flatMap(line => LogParser.parseLine(line))
>>>> > .groupByKey(new HashPartitioner(numPartitions))
>>>> > .mapPartitionsWithIndex(...)
>>>> > .foreach(_ => {})
>>>> >
>>>> > where LogParser is a singleton which may take some time to
>>>> initialized and
>>>> > is shared across the execuator.
>>>> >
>>>> > the flatMap stage is 3x slower.
>>>> >
>>>> > We tried to change spark.shuffle.manager back to hash, and
>>>> > spark.shuffle.blockTransferService back to nio, but didn’t help.
>>>> >
>>>> > May somebody explain possible causes, or what should we test or
>>>> change to
>>>> > find it out
>>>>
>>>
>>>
>

Reply via email to