Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc.
Robin Sent from my iPhone > On 7 Dec 2015, at 18:50, Annabel Melongo <melongo_anna...@yahoo.com> wrote: > > Jia, > > I'm so confused on this. The architecture of Spark is to run on top of HDFS. > What you're requesting, reading and writing to a C++ process, is not part of > that requirement. > > > > > > On Monday, December 7, 2015 1:42 PM, Jia <jacqueline...@gmail.com> wrote: > > > Thanks, Annabel, but I may need to clarify that I have no intention to write > and run Spark UDF in C++, I'm just wondering whether Spark can read and write > data to a C++ process with zero copy. > > Best Regards, > Jia > > > >> On Dec 7, 2015, at 12:26 PM, Annabel Melongo <melongo_anna...@yahoo.com> >> wrote: >> >> My guess is that Jia wants to run C++ on top of Spark. If that's the case, >> I'm afraid this is not possible. Spark has support for Java, Python, Scala >> and R. >> >> The best way to achieve this is to run your application in C++ and used the >> data created by said application to do manipulation within Spark. >> >> >> >> On Monday, December 7, 2015 1:15 PM, Jia <jacqueline...@gmail.com> wrote: >> >> >> Thanks, Dewful! >> >> My impression is that Tachyon is a very nice in-memory file system that can >> connect to multiple storages. >> However, because our data is also hold in memory, I suspect that connecting >> to Spark directly may be more efficient in performance. >> But definitely I need to look at Tachyon more carefully, in case it has a >> very efficient C++ binding mechanism. >> >> Best Regards, >> Jia >> >>> On Dec 7, 2015, at 11:46 AM, Dewful <dew...@gmail.com> wrote: >>> >>> Maybe looking into something like Tachyon would help, I see some sample c++ >>> bindings, not sure how much of the current functionality they support... >>> Hi, Robin, >>> Thanks for your reply and thanks for copying my question to user mailing >>> list. >>> Yes, we have a distributed C++ application, that will store data on each >>> node in the cluster, and we hope to leverage Spark to do more fancy >>> analytics on those data. But we need high performance, that’s why we want >>> shared memory. >>> Suggestions will be highly appreciated! >>> >>> Best Regards, >>> Jia >>> >>>> On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote: >>>> >>>> -dev, +user (this is not a question about development of Spark itself so >>>> you’ll get more answers in the user mailing list) >>>> >>>> First up let me say that I don’t really know how this could be done - I’m >>>> sure it would be possible with enough tinkering but it’s not clear what >>>> you are trying to achieve. Spark is a distributed processing system, it >>>> has multiple JVMs running on different machines that each run a small part >>>> of the overall processing. Unless you have some sort of idea to have >>>> multiple C++ processes collocated with the distributed JVMs using named >>>> memory mapped files doesn’t make architectural sense. >>>> ------------------------------------------------------------------------------- >>>> Robin East >>>> Spark GraphX in Action Michael Malak and Robin East >>>> Manning Publications Co. >>>> http://www.manning.com/books/spark-graphx-in-action >>>> >>>> >>>> >>>> >>>> >>>>> On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote: >>>>> >>>>> Dears, for one project, I need to implement something so Spark can read >>>>> data from a C++ process. >>>>> To provide high performance, I really hope to implement this through >>>>> shared memory between the C++ process and Java JVM process. >>>>> It seems it may be possible to use named memory mapped files and JNI to >>>>> do this, but I wonder whether there is any existing efforts or more >>>>> efficient approach to do this? >>>>> Thank you very much! >>>>> >>>>> Best Regards, >>>>> Jia >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: dev-h...@spark.apache.org > > >