Have you taken a look at SPARK-14915 ? On Tue, May 24, 2016 at 1:00 PM, Adrien Mogenet < adrien.moge...@contentsquare.com> wrote:
> Hi, > > I'm wondering how Spark is setting the "index" of task? > I'm asking this question because we have a job that constantly fails at > task index = 421. > > When increasing number of partitions, this then fails at index=4421. > Increase it a little bit more, now it's 24421. > > Our job is as simple as "(1) read json -> (2) group-by sesion identifier > -> (3) write parquet files" and always fails somewhere at step (3) with a > CommitDeniedException. We've identified that some troubles are basically > due to uneven data repartition right after step (2), and now try to go > further in our understanding on how does Spark behaves. > > We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0 > > -- > > *Adrien Mogenet* > Head of Backend/Infrastructure > adrien.moge...@contentsquare.com > http://www.contentsquare.com > 50, avenue Montaigne - 75008 Paris >