Re: Gremlin on Flink & Gelly?

Vasiliki Kalavri Tue, 08 Dec 2015 09:56:56 -0800

Hi all,

thank you for your replies and sorry for the long silence.


Flink doesn't have a graph query language yet, so Gremlin support would be
a really nice contribution.
I have read the blog post and also the Gremlin paper. There are some really
great ideas in there!

I'm currently quite busy with several projects, so I don't see myself
working on a FlinkGraphComputer soon. If someone from the TinkerPop
community would like to take this on, I (and the rest of the Flink
community) would of course be more than happy to provide feedback and help
with Flink-related issues. Otherwise, I'll get back to you once my load
levels decrease a bit :)

Keep up the great work!

Best,
-Vasia.

On 4 December 2015 at 11:28, James Thornton <[email protected]> wrote:

> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
> *Flink*
>
> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
> <https://youtu.be/-tFzG2dzJXw>
> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <[email protected]> wrote:
>
> > Hi Vasia (everyone),
> >
> > Does Flink have a graph query language? If not, then with a
> > FlinkGraphComputer implementation, Flink could ship with Gremlin support.
> >
> > If you have the time, please read the following blog post as it will help
> > explain our approach and how Flink could benefit from it:
> >
> >
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> >
> > In short, if Flink provides a FlinkGraphComputer implementation, then the
> > Gremlin virtual machine will work over Flink and any language that
> compiles
> > to the Gremlin virtual machine will thus work over Flink.
> >
> > If you would like to see a demo of TinkerPop with, for example Spark or
> > Giraph, I'd be more than happy to do a Google Hangout session with you
> (< 1
> > hour) so you can better understand the breadth of the work we are doing
> and
> > how it can benefit your efforts.
> >
> > Thanks Vasia,
> > Marko.
> >
> > http://markorodriguez.com
> >
> > On Nov 27, 2015, at 5:27 AM, Stephen Mallette <[email protected]>
> > wrote:
> >
> > > Hi Vasia, I had started tinkering on it in my spare time in a separate
> > > repo.  There really isn't much to collaborate on at this point.  I was
> > > mostly trying to understand the parallels between Flink and Spark so
> > that I
> > > could understand how a FlinkGraphComputer could be implemented given
> what
> > > I'd seen of the Spark implementation Marko did.  I had expected to
> > > contribute the work to Flink (rather than keep it here on the TinkerPop
> > > side).  Anyway, not much else to offer - Marko can probably get you
> > running
> > > much faster than I can, as that area is where he holds the most
> > expertise.
> > > You should probably keep an eye out for his comments.
> > >
> > >
> > >
> > > On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]>
> > wrote:
> > >
> > >> Hi James and TinkerPop community,
> > >>
> > >> thanks a lot for starting this discussion!
> > >> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
> you
> > ;)
> > >>
> > >> I'm only starting to get familiar with the TinkerPop project, but it
> > seems
> > >> that it can play well with Flink.
> > >> As you already noticed, a FlinkGraphComputer should be
> straight-forward
> > to
> > >> implement. Gelly has a vertex-centric API that is similar to the
> > >> scatter-gather model [1] and a gather-sum-apply API [2] that is closer
> > to
> > >> the Powergraph model. These are built on top of Flink's delta
> iteration
> > >> operators, which are more generic and could also be used directly for
> > the
> > >> FlinkGraphComputer, if the existing Gelly abstractions won't work.
> > >>
> > >> Regarding the difference between stream and batch in Flink. Flink is a
> > >> streaming dataflow engine, on top of which you can run both streaming
> > and
> > >> batch jobs. A batch job is simply seen by Flink as a job operating on
> a
> > >> finite stream. Respectively, Flink has a stream and a batch API. Gelly
> > is
> > >> currently built on top of the batch API, i.e. the DataSet API.
> > >>
> > >> James mentioned in the Flink mailing list that someone has already
> > started
> > >> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know
> > if
> > >> you have questions or you think I can help in some way!
> > >>
> > >> Cheers,
> > >> -Vasia.
> > >>
> > >> [1]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
> > >> [2]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
> > >> [3]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
> > >>
> > >> On 25 November 2015 at 17:05, James Thornton <[email protected]
> >
> > >> wrote:
> > >>
> > >>> Hi Vasia -
> > >>>
> > >>> Welcome to TinkerPop (linking you into the Flink thread as
> > requested)...
> > >>>
> > >>> - James
> > >>>
> > >>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
> > [email protected]>
> > >>> wrote:
> > >>>
> > >>>> Hi James,
> > >>>>
> > >>>> Thank you for always having a ear to the tech pulse. If it wasn't
> for
> > >>> you,
> > >>>> I would still be excited about XMPP and would be programming in
> > Tcl/Tk.
> > >>>>
> > >>>> Given my 20 minute review of their docs …… It would be cool if like
> > the
> > >>>> "Table API," they also had a "Graph API" that was just TinkerPop
> > >>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
> step
> > >> --
> > >>>> they already have a "vertex-centric" API and thus, having a
> > >>>> FlinkGraphComputer implementation seems "easy." Then from there,
> > >> Gremlin
> > >>>> should just work. I don't really understand the difference between
> > >> steam
> > >>>> and batch unless they are talking the difference between "Storm" and
> > >>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
> > >>>> stream-scene.
> > >>>>
> > >>>> Next, their fluent API is similar to Spark's and I would argue that
> > >>>> Gremlin's API is much nicer than just low-level primitives like
> map(),
> > >>>> flatMap(), etc. Thus, they could really benefit from having a full
> > >> graph
> > >>>> query language already available for their users. (As a side note,
> its
> > >>>> really nice to see more and more systems use functional/fluent APIs
> as
> > >>> this
> > >>>> really trains the next generation to think like this which is
> > important
> > >>> as
> > >>>> Gremlin is purely this! Hopefully the SQL model of querying starts
> to
> > >>> look
> > >>>> odd to people in comparison.)
> > >>>>
> > >>>> I just sent out this tweet:
> > >>>>
> https://twitter.com/apachetinkerpop/status/668820458599530497
> > >>>>
> > >>>> If they seem positive, I can detail in JIRA what would be required
> for
> > >>>> them to have TinkerPop-support.
> > >>>>
> > >>>> Thanks again James,
> > >>>> Marko.
> > >>>>
> > >>>> http://markorodriguez.com
> > >>>>
> > >>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
> [email protected]
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi -
> > >>>>>
> > >>>>> Apache Flink has a graph API named Gelly...
> > >>>>>
> > >>>>>
> > >> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
> > >>>>>
> > >>>>> ...and Flink's "dedicated support for iterative operations" should
> > >> pair
> > >>>>> well with Gremlin:
> > >>>>>
> > >>>>> https://flink.apache.org/features.html
> > >>>>>
> > >>>>> Has anyone dug into this yet?
> > >>>>>
> > >>>>> - James
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> James Thornton, *http://electricspeed.com <
> http://electricspeed.com
> > >>> *
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> James Thornton, *http://electricspeed.com <http://electricspeed.com
> >*
> > >>>
> > >>
> >
> >
>

Re: Gremlin on Flink & Gelly?

Reply via email to