Hi all, thank you for your replies and sorry for the long silence.
Flink doesn't have a graph query language yet, so Gremlin support would be a really nice contribution. I have read the blog post and also the Gremlin paper. There are some really great ideas in there! I'm currently quite busy with several projects, so I don't see myself working on a FlinkGraphComputer soon. If someone from the TinkerPop community would like to take this on, I (and the rest of the Flink community) would of course be more than happy to provide feedback and help with Flink-related issues. Otherwise, I'll get back to you once my load levels decrease a bit :) Keep up the great work! Best, -Vasia. On 4 December 2015 at 11:28, James Thornton <[email protected]> wrote: > *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache * > *Flink* > > https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw> > youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw > <https://youtu.be/-tFzG2dzJXw> > On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <[email protected]> wrote: > > > Hi Vasia (everyone), > > > > Does Flink have a graph query language? If not, then with a > > FlinkGraphComputer implementation, Flink could ship with Gremlin support. > > > > If you have the time, please read the following blog post as it will help > > explain our approach and how Flink could benefit from it: > > > > > http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine > > > > In short, if Flink provides a FlinkGraphComputer implementation, then the > > Gremlin virtual machine will work over Flink and any language that > compiles > > to the Gremlin virtual machine will thus work over Flink. > > > > If you would like to see a demo of TinkerPop with, for example Spark or > > Giraph, I'd be more than happy to do a Google Hangout session with you > (< 1 > > hour) so you can better understand the breadth of the work we are doing > and > > how it can benefit your efforts. > > > > Thanks Vasia, > > Marko. > > > > http://markorodriguez.com > > > > On Nov 27, 2015, at 5:27 AM, Stephen Mallette <[email protected]> > > wrote: > > > > > Hi Vasia, I had started tinkering on it in my spare time in a separate > > > repo. There really isn't much to collaborate on at this point. I was > > > mostly trying to understand the parallels between Flink and Spark so > > that I > > > could understand how a FlinkGraphComputer could be implemented given > what > > > I'd seen of the Spark implementation Marko did. I had expected to > > > contribute the work to Flink (rather than keep it here on the TinkerPop > > > side). Anyway, not much else to offer - Marko can probably get you > > running > > > much faster than I can, as that area is where he holds the most > > expertise. > > > You should probably keep an eye out for his comments. > > > > > > > > > > > > On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]> > > wrote: > > > > > >> Hi James and TinkerPop community, > > >> > > >> thanks a lot for starting this discussion! > > >> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet > you > > ;) > > >> > > >> I'm only starting to get familiar with the TinkerPop project, but it > > seems > > >> that it can play well with Flink. > > >> As you already noticed, a FlinkGraphComputer should be > straight-forward > > to > > >> implement. Gelly has a vertex-centric API that is similar to the > > >> scatter-gather model [1] and a gather-sum-apply API [2] that is closer > > to > > >> the Powergraph model. These are built on top of Flink's delta > iteration > > >> operators, which are more generic and could also be used directly for > > the > > >> FlinkGraphComputer, if the existing Gelly abstractions won't work. > > >> > > >> Regarding the difference between stream and batch in Flink. Flink is a > > >> streaming dataflow engine, on top of which you can run both streaming > > and > > >> batch jobs. A batch job is simply seen by Flink as a job operating on > a > > >> finite stream. Respectively, Flink has a stream and a batch API. Gelly > > is > > >> currently built on top of the batch API, i.e. the DataSet API. > > >> > > >> James mentioned in the Flink mailing list that someone has already > > started > > >> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know > > if > > >> you have questions or you think I can help in some way! > > >> > > >> Cheers, > > >> -Vasia. > > >> > > >> [1]: > > >> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations > > >> [2]: > > >> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations > > >> [3]: > > >> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator > > >> > > >> On 25 November 2015 at 17:05, James Thornton <[email protected] > > > > >> wrote: > > >> > > >>> Hi Vasia - > > >>> > > >>> Welcome to TinkerPop (linking you into the Flink thread as > > requested)... > > >>> > > >>> - James > > >>> > > >>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez < > > [email protected]> > > >>> wrote: > > >>> > > >>>> Hi James, > > >>>> > > >>>> Thank you for always having a ear to the tech pulse. If it wasn't > for > > >>> you, > > >>>> I would still be excited about XMPP and would be programming in > > Tcl/Tk. > > >>>> > > >>>> Given my 20 minute review of their docs …… It would be cool if like > > the > > >>>> "Table API," they also had a "Graph API" that was just TinkerPop > > >>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple > step > > >> -- > > >>>> they already have a "vertex-centric" API and thus, having a > > >>>> FlinkGraphComputer implementation seems "easy." Then from there, > > >> Gremlin > > >>>> should just work. I don't really understand the difference between > > >> steam > > >>>> and batch unless they are talking the difference between "Storm" and > > >>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the > > >>>> stream-scene. > > >>>> > > >>>> Next, their fluent API is similar to Spark's and I would argue that > > >>>> Gremlin's API is much nicer than just low-level primitives like > map(), > > >>>> flatMap(), etc. Thus, they could really benefit from having a full > > >> graph > > >>>> query language already available for their users. (As a side note, > its > > >>>> really nice to see more and more systems use functional/fluent APIs > as > > >>> this > > >>>> really trains the next generation to think like this which is > > important > > >>> as > > >>>> Gremlin is purely this! Hopefully the SQL model of querying starts > to > > >>> look > > >>>> odd to people in comparison.) > > >>>> > > >>>> I just sent out this tweet: > > >>>> > https://twitter.com/apachetinkerpop/status/668820458599530497 > > >>>> > > >>>> If they seem positive, I can detail in JIRA what would be required > for > > >>>> them to have TinkerPop-support. > > >>>> > > >>>> Thanks again James, > > >>>> Marko. > > >>>> > > >>>> http://markorodriguez.com > > >>>> > > >>>> On Nov 19, 2015, at 12:19 PM, James Thornton < > [email protected] > > > > > >>>> wrote: > > >>>> > > >>>>> Hi - > > >>>>> > > >>>>> Apache Flink has a graph API named Gelly... > > >>>>> > > >>>>> > > >> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html > > >>>>> > > >>>>> ...and Flink's "dedicated support for iterative operations" should > > >> pair > > >>>>> well with Gremlin: > > >>>>> > > >>>>> https://flink.apache.org/features.html > > >>>>> > > >>>>> Has anyone dug into this yet? > > >>>>> > > >>>>> - James > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> James Thornton, *http://electricspeed.com < > http://electricspeed.com > > >>> * > > >>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> James Thornton, *http://electricspeed.com <http://electricspeed.com > >* > > >>> > > >> > > > > >
