Maybe some change related to serialize the closure cause LogParser is not a singleton any more, then it is initialized for every task.
Could you change it to a Broadcast? On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com> wrote: > Currently we are migrating from spark 1.1 to spark 1.2, but found the > program 3x slower, with nothing else changed. > note: our program in spark 1.1 has successfully processed a whole year data, > quite stable. > > the main script is as below > > sc.textFile(inputPath) > .flatMap(line => LogParser.parseLine(line)) > .groupByKey(new HashPartitioner(numPartitions)) > .mapPartitionsWithIndex(...) > .foreach(_ => {}) > > where LogParser is a singleton which may take some time to initialized and > is shared across the execuator. > > the flatMap stage is 3x slower. > > We tried to change spark.shuffle.manager back to hash, and > spark.shuffle.blockTransferService back to nio, but didn’t help. > > May somebody explain possible causes, or what should we test or change to > find it out --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org