Hi,
On 9 December 2015 at 17:06, Marko Rodriguez <[email protected]> wrote: > Hi Vasia, > > > Flink doesn't have a graph query language yet, so Gremlin support would > be > > a really nice contribution. > > I have read the blog post and also the Gremlin paper. There are some > really > > great ideas in there! > > Great. Glad you are excited about Gremlin. > > > I'm currently quite busy with several projects, so I don't see myself > > working on a FlinkGraphComputer soon. If someone from the TinkerPop > > community would like to take this on, I (and the rest of the Flink > > community) would of course be more than happy to provide feedback and > help > > with Flink-related issues. Otherwise, I'll get back to you once my load > > levels decrease a bit :) > > In the past, TinkerPop use to be a "dumping ground" for all > implementations, but we decided for TinkerPop3 that we would only have > "reference implementations" so users can play, system providers can learn, > and ultimately, system providers would provide TinkerPop support in their > distribution. As such, we would like to have FlinkGraphComputer distributed > with Flink. If that sounds like something your project would be comfortable > with, I think we can provide a JIRA/PR for FlinkGraphComputer (as well as > any necessary documentation). We can start with a JIRA ticket to get things > going. Thoughts? > I see. This makes sense. It sounds like a good idea to me! Let me sync with the Flink community, so we make sure we're all in the same page. I'll cc dev@tinkerpop, so both communities can provide feedback. Thanks! -Vasia. > > Besides some I/O stuff (InputFormats, RDDs, etc.), this is the beef of the > SparkGraphComputer implementation: > > https://github.com/apache/incubator-tinkerpop/tree/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer > > > Keep up the great work! > > Thanks, you too. > > Marko. > > http://markorodriguez.com > > > > > > > On 4 December 2015 at 11:28, James Thornton <[email protected]> > wrote: > > > >> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache * > >> *Flink* > >> > >> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw> > >> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw > >> <https://youtu.be/-tFzG2dzJXw> > >> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <[email protected]> > wrote: > >> > >>> Hi Vasia (everyone), > >>> > >>> Does Flink have a graph query language? If not, then with a > >>> FlinkGraphComputer implementation, Flink could ship with Gremlin > support. > >>> > >>> If you have the time, please read the following blog post as it will > help > >>> explain our approach and how Flink could benefit from it: > >>> > >>> > >> > http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine > >>> > >>> In short, if Flink provides a FlinkGraphComputer implementation, then > the > >>> Gremlin virtual machine will work over Flink and any language that > >> compiles > >>> to the Gremlin virtual machine will thus work over Flink. > >>> > >>> If you would like to see a demo of TinkerPop with, for example Spark or > >>> Giraph, I'd be more than happy to do a Google Hangout session with you > >> (< 1 > >>> hour) so you can better understand the breadth of the work we are doing > >> and > >>> how it can benefit your efforts. > >>> > >>> Thanks Vasia, > >>> Marko. > >>> > >>> http://markorodriguez.com > >>> > >>> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <[email protected]> > >>> wrote: > >>> > >>>> Hi Vasia, I had started tinkering on it in my spare time in a separate > >>>> repo. There really isn't much to collaborate on at this point. I was > >>>> mostly trying to understand the parallels between Flink and Spark so > >>> that I > >>>> could understand how a FlinkGraphComputer could be implemented given > >> what > >>>> I'd seen of the Spark implementation Marko did. I had expected to > >>>> contribute the work to Flink (rather than keep it here on the > TinkerPop > >>>> side). Anyway, not much else to offer - Marko can probably get you > >>> running > >>>> much faster than I can, as that area is where he holds the most > >>> expertise. > >>>> You should probably keep an eye out for his comments. > >>>> > >>>> > >>>> > >>>> On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]> > >>> wrote: > >>>> > >>>>> Hi James and TinkerPop community, > >>>>> > >>>>> thanks a lot for starting this discussion! > >>>>> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet > >> you > >>> ;) > >>>>> > >>>>> I'm only starting to get familiar with the TinkerPop project, but it > >>> seems > >>>>> that it can play well with Flink. > >>>>> As you already noticed, a FlinkGraphComputer should be > >> straight-forward > >>> to > >>>>> implement. Gelly has a vertex-centric API that is similar to the > >>>>> scatter-gather model [1] and a gather-sum-apply API [2] that is > closer > >>> to > >>>>> the Powergraph model. These are built on top of Flink's delta > >> iteration > >>>>> operators, which are more generic and could also be used directly for > >>> the > >>>>> FlinkGraphComputer, if the existing Gelly abstractions won't work. > >>>>> > >>>>> Regarding the difference between stream and batch in Flink. Flink is > a > >>>>> streaming dataflow engine, on top of which you can run both streaming > >>> and > >>>>> batch jobs. A batch job is simply seen by Flink as a job operating on > >> a > >>>>> finite stream. Respectively, Flink has a stream and a batch API. > Gelly > >>> is > >>>>> currently built on top of the batch API, i.e. the DataSet API. > >>>>> > >>>>> James mentioned in the Flink mailing list that someone has already > >>> started > >>>>> working on a FlinkGraphComputer. Is there a JIRA for this? Let me > know > >>> if > >>>>> you have questions or you think I can help in some way! > >>>>> > >>>>> Cheers, > >>>>> -Vasia. > >>>>> > >>>>> [1]: > >>>>> > >>>>> > >>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations > >>>>> [2]: > >>>>> > >>>>> > >>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations > >>>>> [3]: > >>>>> > >>>>> > >>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator > >>>>> > >>>>> On 25 November 2015 at 17:05, James Thornton < > [email protected] > >>> > >>>>> wrote: > >>>>> > >>>>>> Hi Vasia - > >>>>>> > >>>>>> Welcome to TinkerPop (linking you into the Flink thread as > >>> requested)... > >>>>>> > >>>>>> - James > >>>>>> > >>>>>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez < > >>> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi James, > >>>>>>> > >>>>>>> Thank you for always having a ear to the tech pulse. If it wasn't > >> for > >>>>>> you, > >>>>>>> I would still be excited about XMPP and would be programming in > >>> Tcl/Tk. > >>>>>>> > >>>>>>> Given my 20 minute review of their docs …… It would be cool if like > >>> the > >>>>>>> "Table API," they also had a "Graph API" that was just TinkerPop > >>>>>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple > >> step > >>>>> -- > >>>>>>> they already have a "vertex-centric" API and thus, having a > >>>>>>> FlinkGraphComputer implementation seems "easy." Then from there, > >>>>> Gremlin > >>>>>>> should just work. I don't really understand the difference between > >>>>> steam > >>>>>>> and batch unless they are talking the difference between "Storm" > and > >>>>>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the > >>>>>>> stream-scene. > >>>>>>> > >>>>>>> Next, their fluent API is similar to Spark's and I would argue that > >>>>>>> Gremlin's API is much nicer than just low-level primitives like > >> map(), > >>>>>>> flatMap(), etc. Thus, they could really benefit from having a full > >>>>> graph > >>>>>>> query language already available for their users. (As a side note, > >> its > >>>>>>> really nice to see more and more systems use functional/fluent APIs > >> as > >>>>>> this > >>>>>>> really trains the next generation to think like this which is > >>> important > >>>>>> as > >>>>>>> Gremlin is purely this! Hopefully the SQL model of querying starts > >> to > >>>>>> look > >>>>>>> odd to people in comparison.) > >>>>>>> > >>>>>>> I just sent out this tweet: > >>>>>>> > >> https://twitter.com/apachetinkerpop/status/668820458599530497 > >>>>>>> > >>>>>>> If they seem positive, I can detail in JIRA what would be required > >> for > >>>>>>> them to have TinkerPop-support. > >>>>>>> > >>>>>>> Thanks again James, > >>>>>>> Marko. > >>>>>>> > >>>>>>> http://markorodriguez.com > >>>>>>> > >>>>>>> On Nov 19, 2015, at 12:19 PM, James Thornton < > >> [email protected] > >>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi - > >>>>>>>> > >>>>>>>> Apache Flink has a graph API named Gelly... > >>>>>>>> > >>>>>>>> > >>>>> > https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html > >>>>>>>> > >>>>>>>> ...and Flink's "dedicated support for iterative operations" should > >>>>> pair > >>>>>>>> well with Gremlin: > >>>>>>>> > >>>>>>>> https://flink.apache.org/features.html > >>>>>>>> > >>>>>>>> Has anyone dug into this yet? > >>>>>>>> > >>>>>>>> - James > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> James Thornton, *http://electricspeed.com < > >> http://electricspeed.com > >>>>>> * > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com > >>> * > >>>>>> > >>>>> > >>> > >>> > >> > >
