Junfeng, by off the heap solution, did you mean "rdd.persist(OFF_HEAP)"? That feature is different from the lineage feature. You can use this feature (rdd.persist(OFF_HEAP)) now for any Spark version later than 1.0.0 with Tachyon without a problem.
Regarding Reynold's last email, those are good points. Tachyon had provided this a while ago. We are working on enhancing this feature and the integration part with Spark. Thanks, Haoyuan On Fri, Dec 12, 2014 at 5:06 AM, Jun Feng Liu <liuj...@cn.ibm.com> wrote: > > I think the linage is the key feature of tachyon to reproduce the RDD when > any error happen. Otherwise, there have to be some data replica among > tachyon nodes to ensure the data redundancy for fault tolerant - I think > tachyon is avoiding to go to this path. Dose it mean the off-heap solution > is not ready yet if tachyon linage dose not work right now? > > Best Regards > > > *Jun Feng Liu* > IBM China Systems & Technology Laboratory in Beijing > > ------------------------------ > [image: 2D barcode - encoded with contact information] *Phone: > *86-10-82452683 > > * E-mail:* *liuj...@cn.ibm.com* <liuj...@cn.ibm.com> > [image: IBM] > > BLD 28,ZGC Software Park > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 > China > > > > > > *Reynold Xin <r...@databricks.com <r...@databricks.com>>* > > 2014/12/12 10:22 > To > Andrew Ash <and...@andrewash.com>, > cc > Jun Feng Liu/China/IBM@IBMCN, "dev@spark.apache.org" <dev@spark.apache.org > > > Subject > Re: Tachyon in Spark > > > > > Actually HY emailed me offline about this and this is supported in the > latest version of Tachyon. It is a hard problem to push this into storage; > need to think about how to handle isolation, resource allocation, etc. > > > https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java > > On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin <r...@databricks.com> wrote: > > > I don't think the lineage thing is even turned on in Tachyon - it was > > mostly a research prototype, so I don't think it'd make sense for us to > use > > that. > > > > > > On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash <and...@andrewash.com> > wrote: > > > >> I'm interested in understanding this as well. One of the main ways > >> Tachyon > >> is supposed to realize performance gains without sacrificing durability > is > >> by storing the lineage of data rather than full copies of it (similar to > >> Spark). But if Spark isn't sending lineage information into Tachyon, > then > >> I'm not sure how this isn't a durability concern. > >> > >> On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu <liuj...@cn.ibm.com> > wrote: > >> > >> > Dose Spark today really leverage Tachyon linage to process data? It > >> seems > >> > like the application should call createDependency function in > TachyonFS > >> > to create a new linage node. But I did not find any place call that in > >> > Spark code. Did I missed anything? > >> > > >> > Best Regards > >> > > >> > > >> > *Jun Feng Liu* > >> > IBM China Systems & Technology Laboratory in Beijing > >> > > >> > ------------------------------ > >> > [image: 2D barcode - encoded with contact information] *Phone: > >> *86-10-82452683 > >> > > >> > * E-mail:* *liuj...@cn.ibm.com* <liuj...@cn.ibm.com> > >> > [image: IBM] > >> > > >> > BLD 28,ZGC Software Park > >> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 > >> > China > >> > > >> > > >> > > >> > > >> > > >> > > > > > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/