Re: Shared memory between C++ process and Spark

Jia Mon, 07 Dec 2015 10:39:06 -0800

Hi, Kazuaki,

It’s very similar with my requirement, thanks!
It seems they want to write to a C++ process with zero copy, and I want to do 
both read/write with zero copy.
Any one knows how to obtain more information like current status of this JIRA 
entry?


Best Regards,
Jia




On Dec 7, 2015, at 12:26 PM, Kazuaki Ishizaki <[email protected]> wrote:

> Is this JIRA entry related to what you want?
> https://issues.apache.org/jira/browse/SPARK-10399
> 
> Regards,
> Kazuaki Ishizaki
> 
> 
> 
> From:        Jia <[email protected]>
> To:        Dewful <[email protected]>
> Cc:        "user @spark" <[email protected]>, [email protected], Robin 
> East <[email protected]>
> Date:        2015/12/08 03:17
> Subject:        Re: Shared memory between C++ process and Spark
> 
> 
> 
> Thanks, Dewful!
> 
> My impression is that Tachyon is a very nice in-memory file system that can 
> connect to multiple storages.
> However, because our data is also hold in memory, I suspect that connecting 
> to Spark directly may be more efficient in performance.
> But definitely I need to look at Tachyon more carefully, in case it has a 
> very efficient C++ binding mechanism.
> 
> Best Regards,
> Jia
> 
> On Dec 7, 2015, at 11:46 AM, Dewful <[email protected]> wrote:
> Maybe looking into something like Tachyon would help, I see some sample c++ 
> bindings, not sure how much of the current functionality they support...
> Hi, Robin, 
> Thanks for your reply and thanks for copying my question to user mailing list.
> Yes, we have a distributed C++ application, that will store data on each node 
> in the cluster, and we hope to leverage Spark to do more fancy analytics on 
> those data. But we need high performance, that’s why we want shared memory.
> Suggestions will be highly appreciated!
> 
> Best Regards,
> Jia
> 
> On Dec 7, 2015, at 10:54 AM, Robin East <[email protected]> wrote:
> 
> -dev, +user (this is not a question about development of Spark itself so 
> you’ll get more answers in the user mailing list)
> 
> First up let me say that I don’t really know how this could be done - I’m 
> sure it would be possible with enough tinkering but it’s not clear what you 
> are trying to achieve. Spark is a distributed processing system, it has 
> multiple JVMs running on different machines that each run a small part of the 
> overall processing. Unless you have some sort of idea to have multiple C++ 
> processes collocated with the distributed JVMs using named memory mapped 
> files doesn’t make architectural sense. 
> -------------------------------------------------------------------------------
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
> 
> 
> 
> 
> 
> On 6 Dec 2015, at 20:43, Jia <[email protected]> wrote:
> 
> Dears, for one project, I need to implement something so Spark can read data 
> from a C++ process. 
> To provide high performance, I really hope to implement this through shared 
> memory between the C++ process and Java JVM process.
> It seems it may be possible to use named memory mapped files and JNI to do 
> this, but I wonder whether there is any existing efforts or more efficient 
> approach to do this?
> Thank you very much!
> 
> Best Regards,
> Jia
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 
> 
> 
>

Re: Shared memory between C++ process and Spark

Reply via email to