Have you taken a look at SPARK-14915 ?

On Tue, May 24, 2016 at 1:00 PM, Adrien Mogenet <
adrien.moge...@contentsquare.com> wrote:

> Hi,
>
> I'm wondering how Spark is setting the "index" of task?
> I'm asking this question because we have a job that constantly fails at
> task index = 421.
>
> When increasing number of partitions, this then fails at index=4421.
> Increase it a little bit more, now it's 24421.
>
> Our job is as simple as "(1) read json -> (2) group-by sesion identifier
> -> (3) write parquet files" and always fails somewhere at step (3) with a
> CommitDeniedException. We've identified that some troubles are basically
> due to uneven data repartition right after step (2), and now try to go
> further in our understanding on how does Spark behaves.
>
> We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0
>
> --
>
> *Adrien Mogenet*
> Head of Backend/Infrastructure
> adrien.moge...@contentsquare.com
> http://www.contentsquare.com
> 50, avenue Montaigne - 75008 Paris
>

Reply via email to