Maybe some change related to serialize the closure cause LogParser is
not a singleton any more, then it is initialized for every task.

Could you change it to a Broadcast?

On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com> wrote:
> Currently we are migrating from spark 1.1 to spark 1.2, but found the
> program 3x slower, with nothing else changed.
> note: our program in spark 1.1 has successfully processed a whole year data,
> quite stable.
>
> the main script is as below
>
> sc.textFile(inputPath)
> .flatMap(line => LogParser.parseLine(line))
> .groupByKey(new HashPartitioner(numPartitions))
> .mapPartitionsWithIndex(...)
> .foreach(_ => {})
>
> where LogParser is a singleton which may take some time to initialized and
> is shared across the execuator.
>
> the flatMap stage is 3x slower.
>
> We tried to change spark.shuffle.manager back to hash, and
> spark.shuffle.blockTransferService back to nio, but didn’t help.
>
> May somebody explain possible causes, or what should we test or change to
> find it out

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to