Supporting non-JVM code without memory copying and serialization is
actually one of the motivations behind Tungsten. We didn't talk much about
it since it is not end-user-facing and it is still too early. There are a
few challenges still:

1. Spark cannot run entirely in off-heap mode (by entirely here I'm
referring to all the data-plane memory, not control-plane such as RPCs
since those don't matter much). There is nothing fundamental. It just takes
a while to make sure all code paths allocate/free memory using the proper
allocators.

2. The memory layout of data is still in flux, since we are only 4 months
into Tungsten. They will change pretty frequently for the foreseeable
future, and as a result, the C++ side of things will have change as well.



On Sat, Aug 29, 2015 at 12:29 PM, Timothy Chen <tnac...@gmail.com> wrote:

> I would also like to see data shared off-heap to a 3rd party C++
> library with JNI, I think the complications would be how to memory
> manage this and make sure the 3rd party libraries also adhere to the
> access contracts as well.
>
> Tim
>
> On Sat, Aug 29, 2015 at 12:17 PM, Paul Weiss <paulweiss....@gmail.com>
> wrote:
> > Hi,
> >
> > Would the benefits of project tungsten be available for access by non-JVM
> > programs directly into the off-heap memory?  Spark using dataframes w/
> the
> > tungsten improvements will definitely help analytics within the JVM world
> > but accessing outside 3rd party c++ libraries is a challenge especially
> when
> > trying to do it with a zero copy.
> >
> > Ideally the off heap memory would be accessible to a non JVM program and
> be
> > invoked in process using JNI per each partition.  The alternatives to
> this
> > involve additional costs of starting another process if using pipes as
> well
> > as the additional copy all the data.
> >
> > In addition to read only non-JVM access in process would there be a way
> to
> > share the dataframe that is in memory out of process and across spark
> > contexts.  This way an expensive complicated initial build up of a
> dataframe
> > would not have to be replicated as well not having to pay the penalty of
> the
> > startup costs on failure.
> >
> > thanks,
> >
> > -paul
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to