Re: HA support for Spark

Tathagata Das Thu, 11 Dec 2014 04:22:49 -0800

Spark Streaming essentially does this by saving the DAG of DStreams, which
can deterministically regenerate the DAG of RDDs upon recovery from
failure. Along with that the progress information (which batches have
finished, which batches are queued, etc.) is also saved, so that upon
recovery the system can restart from where it was before failure. This was
conceptually easy to do because the RDDs are very deterministically
generated in every batch. Extending this to a very general Spark program
with arbitrary RDD computations is definitely conceptually possible but not
that easy to do.


On Wed, Dec 10, 2014 at 7:34 PM, Jun Feng Liu <[email protected]> wrote:

> Right, perhaps also need preserve some DAG information? I am wondering if
> there is any work around this.
>
>
> [image: Inactive hide details for Sandy Ryza ---2014-12-11
> 01:36:35---Sandy Ryza <[email protected]>]Sandy Ryza ---2014-12-11
> 01:36:35---Sandy Ryza <[email protected]>
>
>
>    *Sandy Ryza <[email protected] <[email protected]>>*
>
>    2014-12-11 01:34
>
>
> To
>
>
>    Jun Feng Liu/China/IBM@IBMCN,
>
>
> cc
>
>
>    Reynold Xin <[email protected]>, "[email protected]" <
>    [email protected]>
>
>
> Subject
>
>
>    Re: HA support for Spark
>
>
> I think that if we were able to maintain the full set of created RDDs as
> well as some scheduler and block manager state, it would be enough for most
> apps to recover.
>
> On Wed, Dec 10, 2014 at 5:30 AM, Jun Feng Liu <[email protected]> wrote:
>
> > Well, it should not be mission impossible thinking there are so many HA
> > solution existing today. I would interest to know if there is any
> specific
> > difficult.
> >
> > Best Regards
> >
> >
> > *Jun Feng Liu*
> > IBM China Systems & Technology Laboratory in Beijing
> >
> >   ------------------------------
> >  [image: 2D barcode - encoded with contact information] *Phone:
> *86-10-82452683
> >
> > * E-mail:* *[email protected]* <[email protected]>
> > [image: IBM]
> >
> > BLD 28,ZGC Software Park
> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> > China
> >
> >
> >
> >
> >
> >  *Reynold Xin <[email protected] <[email protected]>>*
> >
> > 2014/12/10 16:30
> >   To
> > Jun Feng Liu/China/IBM@IBMCN,
> > cc
> > "[email protected]" <[email protected]>
> > Subject
> > Re: HA support for Spark
> >
> >
> >
> >
> > This would be plausible for specific purposes such as Spark streaming or
> > Spark SQL, but I don't think it is doable for general Spark driver since
> it
> > is just a normal JVM process with arbitrary program state.
> >
> > On Wed, Dec 10, 2014 at 12:25 AM, Jun Feng Liu <[email protected]>
> wrote:
> >
> > > Do we have any high availability support in Spark driver level? For
> > > example, if we want spark drive can move to another node continue
> > execution
> > > when failure happen. I can see the RDD checkpoint can help to
> > serialization
> > > the status of RDD. I can image to load the check point from another
> node
> > > when error happen, but seems like will lost track all tasks status or
> > even
> > > executor information that maintain in spark context. I am not sure if
> > there
> > > is any existing stuff I can leverage to do that. thanks for any
> suggests
> > >
> > > Best Regards
> > >
> > >
> > > *Jun Feng Liu*
> > > IBM China Systems & Technology Laboratory in Beijing
> > >
> > >   ------------------------------
> > >  [image: 2D barcode - encoded with contact information] *Phone:
> > *86-10-82452683
> > >
> > > * E-mail:* *[email protected]* <[email protected]>
> > > [image: IBM]
> > >
> > > BLD 28,ZGC Software Park
> > > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> > > China
> > >
> > >
> > >
> > >
> > >
> >
> >
>
>

Re: HA support for Spark

Reply via email to