I don't think there is an approximate timescale right now and its likely any implementation would depend on a solid Java implementation of Arrow being ready first (or even a guarantee that it necessarily will - although I'm interested in making it happen in some places where it makes sense).
On Fri, Aug 5, 2016 at 2:18 PM, Jim Pivarski <jpivar...@gmail.com> wrote: > I see. I've already started working with Arrow-C++ and talking to members > of the Arrow community, so I'll keep doing that. > > As a follow-up question, is there an approximate timescale for when Spark > will support Arrow? I'd just like to know that all the pieces will come > together eventually. > > (In this forum, most of the discussion about Arrow is about PySpark and > Pandas, not Spark in general.) > > Best, > Jim > > On Aug 5, 2016 2:43 PM, "Holden Karau" <hol...@pigscanfly.ca> wrote: > >> Spark does not currently support Apache Arrow - probably a good place to >> chat would be on the Arrow mailing list where they are making progress >> towards unified JVM & Python/R support which is sort of a precondition of a >> functioning Arrow interface between Spark and Python. >> >> On Fri, Aug 5, 2016 at 12:40 PM, jpivar...@gmail.com <jpivar...@gmail.com >> > wrote: >> >>> In a few earlier posts [ 1 >>> <http://apache-spark-developers-list.1001551.n3.nabble.com/T >>> ungsten-off-heap-memory-access-for-C-libraries-td13898.html> >>> ] [ 2 >>> <http://apache-spark-developers-list.1001551.n3.nabble.com/H >>> ow-to-access-the-off-heap-representation-of-cached-data-in- >>> Spark-2-0-td17701.html> >>> ], I asked about moving data from C++ into a Spark data source (RDD, >>> DataFrame, or Dataset). The issue is that even the off-heap cache might >>> not >>> have a stable representation: it might change from one version to the >>> next. >>> >>> I recently learned about Apache Arrow, a data layer that Spark currently >>> or >>> will someday share with Pandas, Impala, etc. Suppose that I can fill a >>> buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there >>> an >>> easy--- or even zero-copy--- way to use that in Spark? Is that an API >>> that >>> could be developed? >>> >>> I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place >>> to >>> ask this question? >>> >>> Thanks, >>> -- Jim >>> >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-developers >>> -list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-to- >>> RDD-DataFrame-Dataset-tp18563.html >>> Sent from the Apache Spark Developers List mailing list archive at >>> Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau