GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine which GraphX used to be.
And I'm sorry to say, it doesn’t fit most scenarioes at all in fact. Enzo, I don’t think there is any roadmap of Graph libraries for Spark for now. *Andy* On Tue, Mar 14, 2017 at 7:28 AM, Tim Hunter <timhun...@databricks.com> wrote: > Hello Enzo, > > since this question is also relevant to Spark, I will answer it here. The > goal of GraphFrames is to provide graph capabilities along with excellent > integration to the rest of the Spark ecosystem (using modern APIs such as > DataFrames). As you seem to be well aware, a large number of graph > algorithms can be implemented in terms of a small subset of graph > primitives. These graph primitives can be translated to Spark operations, > but we feel that some important low-level optimizations should be added to > the Catalyst engine in order to realize the true potential of GraphFrames. > You can find a flavor of this work in this presentation of Ankur Dave [1]. > This is still an area of collaboration with the Spark core team, and we > would like to merge GraphFrames in Spark 2.x eventually. > > Where does it leave us for the time being? GraphFrames is actively > supported, and we implemented a highly scalable version of GraphFrames in > November. As you mentioned, there are a number of distributed Graph > frameworks out there, but to my knowledge they are not as easy to integrate > with Spark. The current approach has been to reach parity with GraphX first > and then add new algorithms based on popular demand. Along these lines, > GraphBLAS could be added on top of it if someone is willing to step up. > > Tim > > [1] https://spark-summit.org/east-2016/events/graphframes- > graph-queries-in-spark-sql/ > > On Mon, Mar 13, 2017 at 2:58 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Since GraphFrames is not part of the Spark project, your >> GraphFrames-specific questions are probably better directed at the >> GraphFrames issue tracker: >> >> https://github.com/graphframes/graphframes/issues >> >> As far as I know, GraphFrames is an active project, though not as active >> as Spark of course. There will be lulls in development since the people >> driving that project forward also have major commitments to other projects. >> This is natural. >> >> If you post on GitHub I would wager somewhere there (maybe Joseph or Tim >> <https://github.com/graphframes/graphframes/graphs/contributors>?) >> should be able to answer your questions about GraphFrames. >> >> >> 1. The page you linked refers to a *plan* to move GraphFrames to the >> standard Spark release cycle. Is this *plan* publicly available / >> visible? >> >> I didn’t see any such reference to a plan in the page I linked you to. >> Rather, the page says >> <http://graphframes.github.io/#what-are-graphframes>: >> >> The current plan is to keep GraphFrames separate from core Apache Spark >> for the time being. >> >> Nick >> >> >> On Mon, Mar 13, 2017 at 5:46 PM enzo <e...@smartinsightsfromdata.com> >> wrote: >> >>> Nick >>> >>> Thanks for the quick answer :) >>> >>> Sadly, the comment in the page doesn’t answer my questions. More >>> specifically: >>> >>> 1. GraphFrames last activity in github was 2 months ago. Last release >>> on 12 Nov 2016. Till recently 2 month was close to a Spark release >>> cycle. Why there has been no major development since mid November? >>> >>> 2. The page you linked refers to a *plan* to move GraphFrames to the >>> standard Spark release cycle. Is this *plan* publicly available / visible? >>> >>> 3. I couldn’t find any statement of intent to preserve either one or the >>> other APIs, or just merge them: in other words, there seem to be no >>> overarching plan for a cohesive & comprehensive graph API (I apologise in >>> advance if I’m wrong). >>> >>> 4. I was initially impressed by GraphFrames syntax in places similar to >>> Neo4J Cypher (now open source), but later I understood was an incomplete >>> lightweight experiment (with no intention to move to full compatibility, >>> perhaps for good reasons). To me it sort of gave the wrong message. >>> >>> 5. In the mean time the world of graphs is changing. GraphBlas forum >>> seems to make some traction: a library based on GraphBlas has been made >>> available on Accumulo (Graphulo). Assuming that Spark is NOT going to >>> adopt similar lines, nor to follow Datastax with tinkertop and Gremlin, >>> again, what is the new, cohesive & comprehensive API that Spark is going >>> to deliver? >>> >>> >>> Sadly, the API uncertainty may force developers to more stable kind of >>> API / platforms & roadmaps. >>> >>> >>> >>> Thanks Enzo >>> >>> On 13 Mar 2017, at 22:09, Nicholas Chammas <nicholas.cham...@gmail.com> >>> wrote: >>> >>> Your question is answered here under "Will GraphFrames be part of Apache >>> Spark?", no? >>> >>> http://graphframes.github.io/#what-are-graphframes >>> >>> Nick >>> >>> On Mon, Mar 13, 2017 at 4:56 PM enzo <e...@smartinsightsfromdata.com> >>> wrote: >>> >>> Please see this email trail: no answer so far on the user@spark >>> board. Trying the developer board for better luck >>> >>> The question: >>> >>> I am a bit confused by the current roadmap for graph and graph analytics >>> in Apache Spark. >>> >>> I understand that we have had for some time two libraries (the following >>> is my understanding - please amend as appropriate!): >>> >>> . GraphX, part of Spark project. This library is based on RDD and it is >>> only accessible via Scala. It doesn’t look that this library has been >>> enhanced recently. >>> . GraphFrames, independent (at the moment?) library for Spark. This >>> library is based on Spark DataFrames and accessible by Scala & Python. Last >>> commit on GitHub was 2 months ago. >>> >>> GraphFrames cam about with the promise at some point to be integrated in >>> Apache Spark. >>> >>> I can see other projects coming up with interesting libraries and ideas >>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing >>> the GraphBlas building blocks for graph algorithms on top of Accumulo). >>> >>> Where is Apache Spark going? >>> >>> Where are graph libraries in the roadmap? >>> >>> >>> >>> Thanks for any clarity brought to this matter. >>> >>> Thanks Enzo >>> >>> Begin forwarded message: >>> >>> *From: *"Md. Rezaul Karim" <rezaul.ka...@insight-centre.org> >>> *Subject: **Re: Question on Spark's graph libraries* >>> *Date: *10 March 2017 at 13:13:15 CET >>> *To: *Robin East <robin.e...@xense.co.uk> >>> *Cc: *enzo <e...@smartinsightsfromdata.com>, spark users < >>> u...@spark.apache.org> >>> >>> +1 >>> >>> Regards, >>> _________________________________ >>> *Md. Rezaul Karim*, BSc, MSc >>> PhD Researcher, INSIGHT Centre for Data Analytics >>> National University of Ireland, Galway >>> IDA Business Park, Dangan, Galway, Ireland >>> Web: http://www.reza-analytics.eu/index.html >>> <http://139.59.184.114/index.html> >>> >>> On 10 March 2017 at 12:10, Robin East <robin.e...@xense.co.uk> wrote: >>> >>> I would love to know the answer to that too. >>> ------------------------------------------------------------ >>> ------------------- >>> Robin East >>> *Spark GraphX in Action* Michael Malak and Robin East >>> Manning Publications Co. >>> http://www.manning.com/books/spark-graphx-in-action >>> >>> >>> >>> >>> >>> On 9 Mar 2017, at 17:42, enzo <e...@smartinsightsfromdata.com> wrote: >>> >>> I am a bit confused by the current roadmap for graph and graph analytics >>> in Apache Spark. >>> >>> I understand that we have had for some time two libraries (the following >>> is my understanding - please amend as appropriate!): >>> >>> . GraphX, part of Spark project. This library is based on RDD and it is >>> only accessible via Scala. It doesn’t look that this library has been >>> enhanced recently. >>> . GraphFrames, independent (at the moment?) library for Spark. This >>> library is based on Spark DataFrames and accessible by Scala & Python. Last >>> commit on GitHub was 2 months ago. >>> >>> GraphFrames cam about with the promise at some point to be integrated in >>> Apache Spark. >>> >>> I can see other projects coming up with interesting libraries and ideas >>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing >>> the GraphBlas building blocks for graph algorithms on top of Accumulo). >>> >>> Where is Apache Spark going? >>> >>> Where are graph libraries in the roadmap? >>> >>> >>> >>> Thanks for any clarity brought to this matter. >>> >>> Enzo >>> >>> >>> >>> >>> >>> >