Thanks the response. I got the point - sounds like todays Spark linage
dose not push to Tachyon linage. Would be good to see how it works.
Jun Feng Liu.
Haoyuan Li
<haoyuan.li@gmail
.com> To
Jun Feng Liu/China/IBM@IBMCN,
2014-12-13 00:17 cc
Reynold Xin <[email protected]>,
Andrew Ash <[email protected]>,
"[email protected]"
<[email protected]>
Subject
Re: Tachyon in Spark
Junfeng, by off the heap solution, did you mean "rdd.persist(OFF_HEAP)"?
That feature is different from the lineage feature. You can use this
feature (rdd.persist(OFF_HEAP)) now for any Spark version later than 1.0.0
with Tachyon without a problem.
Regarding Reynold's last email, those are good points. Tachyon had provided
this a while ago. We are working on enhancing this feature and the
integration part with Spark.
Thanks,
Haoyuan
On Fri, Dec 12, 2014 at 5:06 AM, Jun Feng Liu <[email protected]> wrote:
>
> I think the linage is the key feature of tachyon to reproduce the RDD
when
> any error happen. Otherwise, there have to be some data replica among
> tachyon nodes to ensure the data redundancy for fault tolerant - I think
> tachyon is avoiding to go to this path. Dose it mean the off-heap
solution
> is not ready yet if tachyon linage dose not work right now?
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
> ------------------------------
> [image: 2D barcode - encoded with contact information] *Phone:
*86-10-82452683
>
> * E-mail:* *[email protected]* <[email protected]>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
> *Reynold Xin <[email protected] <[email protected]>>*
>
> 2014/12/12 10:22
> To
> Andrew Ash <[email protected]>,
> cc
> Jun Feng Liu/China/IBM@IBMCN, "[email protected]"
<[email protected]
> >
> Subject
> Re: Tachyon in Spark
>
>
>
>
> Actually HY emailed me offline about this and this is supported in the
> latest version of Tachyon. It is a hard problem to push this into
storage;
> need to think about how to handle isolation, resource allocation, etc.
>
>
>
https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java
>
> On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin <[email protected]> wrote:
>
> > I don't think the lineage thing is even turned on in Tachyon - it was
> > mostly a research prototype, so I don't think it'd make sense for us to
> use
> > that.
> >
> >
> > On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash <[email protected]>
> wrote:
> >
> >> I'm interested in understanding this as well. One of the main ways
> >> Tachyon
> >> is supposed to realize performance gains without sacrificing
durability
> is
> >> by storing the lineage of data rather than full copies of it (similar
to
> >> Spark). But if Spark isn't sending lineage information into Tachyon,
> then
> >> I'm not sure how this isn't a durability concern.
> >>
> >> On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu <[email protected]>
> wrote:
> >>
> >> > Dose Spark today really leverage Tachyon linage to process data? It
> >> seems
> >> > like the application should call createDependency function in
> TachyonFS
> >> > to create a new linage node. But I did not find any place call that
in
> >> > Spark code. Did I missed anything?
> >> >
> >> > Best Regards
> >> >
> >> >
> >> > *Jun Feng Liu*
> >> > IBM China Systems & Technology Laboratory in Beijing
> >> >
> >> > ------------------------------
> >> > [image: 2D barcode - encoded with contact information] *Phone:
> >> *86-10-82452683
> >> >
> >> > * E-mail:* *[email protected]* <[email protected]>
> >> > [image: IBM]
> >> >
> >> > BLD 28,ZGC Software Park
> >> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> >> > China
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>
>
--
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/