Supporting non-JVM code without memory copying and serialization is actually one of the motivations behind Tungsten. We didn't talk much about it since it is not end-user-facing and it is still too early. There are a few challenges still:
1. Spark cannot run entirely in off-heap mode (by entirely here I'm referring to all the data-plane memory, not control-plane such as RPCs since those don't matter much). There is nothing fundamental. It just takes a while to make sure all code paths allocate/free memory using the proper allocators. 2. The memory layout of data is still in flux, since we are only 4 months into Tungsten. They will change pretty frequently for the foreseeable future, and as a result, the C++ side of things will have change as well. On Sat, Aug 29, 2015 at 12:29 PM, Timothy Chen <tnac...@gmail.com> wrote: > I would also like to see data shared off-heap to a 3rd party C++ > library with JNI, I think the complications would be how to memory > manage this and make sure the 3rd party libraries also adhere to the > access contracts as well. > > Tim > > On Sat, Aug 29, 2015 at 12:17 PM, Paul Weiss <paulweiss....@gmail.com> > wrote: > > Hi, > > > > Would the benefits of project tungsten be available for access by non-JVM > > programs directly into the off-heap memory? Spark using dataframes w/ > the > > tungsten improvements will definitely help analytics within the JVM world > > but accessing outside 3rd party c++ libraries is a challenge especially > when > > trying to do it with a zero copy. > > > > Ideally the off heap memory would be accessible to a non JVM program and > be > > invoked in process using JNI per each partition. The alternatives to > this > > involve additional costs of starting another process if using pipes as > well > > as the additional copy all the data. > > > > In addition to read only non-JVM access in process would there be a way > to > > share the dataframe that is in memory out of process and across spark > > contexts. This way an expensive complicated initial build up of a > dataframe > > would not have to be replicated as well not having to pay the penalty of > the > > startup costs on failure. > > > > thanks, > > > > -paul > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >