Re: Tachyon in Spark

Haoyuan Li Fri, 12 Dec 2014 08:20:06 -0800

Junfeng, by off the heap solution, did you mean "rdd.persist(OFF_HEAP)"?
That feature is different from the lineage feature. You can use this
feature (rdd.persist(OFF_HEAP)) now for any Spark version later than 1.0.0
with Tachyon without a problem.


Regarding Reynold's last email, those are good points. Tachyon had provided
this a while ago. We are working on enhancing this feature and the
integration part with Spark.

Thanks,

Haoyuan

On Fri, Dec 12, 2014 at 5:06 AM, Jun Feng Liu <liuj...@cn.ibm.com> wrote:
>
> I think the linage is the key feature of tachyon to reproduce the RDD when
> any error happen. Otherwise, there have to be some data replica among
> tachyon nodes to ensure the data redundancy for fault tolerant - I think
> tachyon is avoiding to go to this path. Dose it mean the off-heap solution
> is not ready yet if tachyon linage dose not work right now?
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information] *Phone: 
> *86-10-82452683
>
> * E-mail:* *liuj...@cn.ibm.com* <liuj...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Reynold Xin <r...@databricks.com <r...@databricks.com>>*
>
> 2014/12/12 10:22
>   To
> Andrew Ash <and...@andrewash.com>,
> cc
> Jun Feng Liu/China/IBM@IBMCN, "dev@spark.apache.org" <dev@spark.apache.org
> >
> Subject
> Re: Tachyon in Spark
>
>
>
>
> Actually HY emailed me offline about this and this is supported in the
> latest version of Tachyon. It is a hard problem to push this into storage;
> need to think about how to handle isolation, resource allocation, etc.
>
>
> https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java
>
> On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin <r...@databricks.com> wrote:
>
> > I don't think the lineage thing is even turned on in Tachyon - it was
> > mostly a research prototype, so I don't think it'd make sense for us to
> use
> > that.
> >
> >
> > On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash <and...@andrewash.com>
> wrote:
> >
> >> I'm interested in understanding this as well.  One of the main ways
> >> Tachyon
> >> is supposed to realize performance gains without sacrificing durability
> is
> >> by storing the lineage of data rather than full copies of it (similar to
> >> Spark).  But if Spark isn't sending lineage information into Tachyon,
> then
> >> I'm not sure how this isn't a durability concern.
> >>
> >> On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu <liuj...@cn.ibm.com>
> wrote:
> >>
> >> > Dose Spark today really leverage Tachyon linage to process data? It
> >> seems
> >> > like the application should call createDependency function in
> TachyonFS
> >> > to create a new linage node. But I did not find any place call that in
> >> > Spark code. Did I missed anything?
> >> >
> >> > Best Regards
> >> >
> >> >
> >> > *Jun Feng Liu*
> >> > IBM China Systems & Technology Laboratory in Beijing
> >> >
> >> >   ------------------------------
> >> >  [image: 2D barcode - encoded with contact information] *Phone:
> >> *86-10-82452683
> >> >
> >> > * E-mail:* *liuj...@cn.ibm.com* <liuj...@cn.ibm.com>
> >> > [image: IBM]
> >> >
> >> > BLD 28,ZGC Software Park
> >> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> >> > China
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>
>

-- 
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/

Re: Tachyon in Spark

Reply via email to