Re: Gremlin on Flink & Gelly?

Vasiliki Kalavri Wed, 09 Dec 2015 11:24:14 -0800

Hi,


On 9 December 2015 at 17:06, Marko Rodriguez <[email protected]> wrote:

> Hi Vasia,
>
> > Flink doesn't have a graph query language yet, so Gremlin support would
> be
> > a really nice contribution.
> > I have read the blog post and also the Gremlin paper. There are some
> really
> > great ideas in there!
>
> Great. Glad you are excited about Gremlin.
>
> > I'm currently quite busy with several projects, so I don't see myself
> > working on a FlinkGraphComputer soon. If someone from the TinkerPop
> > community would like to take this on, I (and the rest of the Flink
> > community) would of course be more than happy to provide feedback and
> help
> > with Flink-related issues. Otherwise, I'll get back to you once my load
> > levels decrease a bit :)
>
> In the past, TinkerPop use to be a "dumping ground" for all
> implementations, but we decided for TinkerPop3 that we would only have
> "reference implementations" so users can play, system providers can learn,
> and ultimately, system providers would provide TinkerPop support in their
> distribution. As such, we would like to have FlinkGraphComputer distributed
> with Flink. If that sounds like something your project would be comfortable
> with, I think we can provide a JIRA/PR for FlinkGraphComputer (as well as
> any necessary documentation). We can start with a JIRA ticket to get things
> going. Thoughts?
>

I see. This makes sense.
It sounds like a good idea to me! Let me sync with the Flink community, so
we make sure we're all in the same page.
I'll cc dev@tinkerpop, so both communities can provide feedback.

Thanks!
-Vasia.



>
> Besides some I/O stuff (InputFormats, RDDs, etc.), this is the beef of the
> SparkGraphComputer implementation:
>
> https://github.com/apache/incubator-tinkerpop/tree/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer
>
> > Keep up the great work!
>
> Thanks, you too.
>
> Marko.
>
> http://markorodriguez.com
>
>
>
> >
> > On 4 December 2015 at 11:28, James Thornton <[email protected]>
> wrote:
> >
> >> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
> >> *Flink*
> >>
> >> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
> >> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
> >> <https://youtu.be/-tFzG2dzJXw>
> >> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <[email protected]>
> wrote:
> >>
> >>> Hi Vasia (everyone),
> >>>
> >>> Does Flink have a graph query language? If not, then with a
> >>> FlinkGraphComputer implementation, Flink could ship with Gremlin
> support.
> >>>
> >>> If you have the time, please read the following blog post as it will
> help
> >>> explain our approach and how Flink could benefit from it:
> >>>
> >>>
> >>
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> >>>
> >>> In short, if Flink provides a FlinkGraphComputer implementation, then
> the
> >>> Gremlin virtual machine will work over Flink and any language that
> >> compiles
> >>> to the Gremlin virtual machine will thus work over Flink.
> >>>
> >>> If you would like to see a demo of TinkerPop with, for example Spark or
> >>> Giraph, I'd be more than happy to do a Google Hangout session with you
> >> (< 1
> >>> hour) so you can better understand the breadth of the work we are doing
> >> and
> >>> how it can benefit your efforts.
> >>>
> >>> Thanks Vasia,
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi Vasia, I had started tinkering on it in my spare time in a separate
> >>>> repo.  There really isn't much to collaborate on at this point.  I was
> >>>> mostly trying to understand the parallels between Flink and Spark so
> >>> that I
> >>>> could understand how a FlinkGraphComputer could be implemented given
> >> what
> >>>> I'd seen of the Spark implementation Marko did.  I had expected to
> >>>> contribute the work to Flink (rather than keep it here on the
> TinkerPop
> >>>> side).  Anyway, not much else to offer - Marko can probably get you
> >>> running
> >>>> much faster than I can, as that area is where he holds the most
> >>> expertise.
> >>>> You should probably keep an eye out for his comments.
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]>
> >>> wrote:
> >>>>
> >>>>> Hi James and TinkerPop community,
> >>>>>
> >>>>> thanks a lot for starting this discussion!
> >>>>> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
> >> you
> >>> ;)
> >>>>>
> >>>>> I'm only starting to get familiar with the TinkerPop project, but it
> >>> seems
> >>>>> that it can play well with Flink.
> >>>>> As you already noticed, a FlinkGraphComputer should be
> >> straight-forward
> >>> to
> >>>>> implement. Gelly has a vertex-centric API that is similar to the
> >>>>> scatter-gather model [1] and a gather-sum-apply API [2] that is
> closer
> >>> to
> >>>>> the Powergraph model. These are built on top of Flink's delta
> >> iteration
> >>>>> operators, which are more generic and could also be used directly for
> >>> the
> >>>>> FlinkGraphComputer, if the existing Gelly abstractions won't work.
> >>>>>
> >>>>> Regarding the difference between stream and batch in Flink. Flink is
> a
> >>>>> streaming dataflow engine, on top of which you can run both streaming
> >>> and
> >>>>> batch jobs. A batch job is simply seen by Flink as a job operating on
> >> a
> >>>>> finite stream. Respectively, Flink has a stream and a batch API.
> Gelly
> >>> is
> >>>>> currently built on top of the batch API, i.e. the DataSet API.
> >>>>>
> >>>>> James mentioned in the Flink mailing list that someone has already
> >>> started
> >>>>> working on a FlinkGraphComputer. Is there a JIRA for this? Let me
> know
> >>> if
> >>>>> you have questions or you think I can help in some way!
> >>>>>
> >>>>> Cheers,
> >>>>> -Vasia.
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
> >>>>> [2]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
> >>>>> [3]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
> >>>>>
> >>>>> On 25 November 2015 at 17:05, James Thornton <
> [email protected]
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Vasia -
> >>>>>>
> >>>>>> Welcome to TinkerPop (linking you into the Flink thread as
> >>> requested)...
> >>>>>>
> >>>>>> - James
> >>>>>>
> >>>>>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
> >>> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi James,
> >>>>>>>
> >>>>>>> Thank you for always having a ear to the tech pulse. If it wasn't
> >> for
> >>>>>> you,
> >>>>>>> I would still be excited about XMPP and would be programming in
> >>> Tcl/Tk.
> >>>>>>>
> >>>>>>> Given my 20 minute review of their docs …… It would be cool if like
> >>> the
> >>>>>>> "Table API," they also had a "Graph API" that was just TinkerPop
> >>>>>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
> >> step
> >>>>> --
> >>>>>>> they already have a "vertex-centric" API and thus, having a
> >>>>>>> FlinkGraphComputer implementation seems "easy." Then from there,
> >>>>> Gremlin
> >>>>>>> should just work. I don't really understand the difference between
> >>>>> steam
> >>>>>>> and batch unless they are talking the difference between "Storm"
> and
> >>>>>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
> >>>>>>> stream-scene.
> >>>>>>>
> >>>>>>> Next, their fluent API is similar to Spark's and I would argue that
> >>>>>>> Gremlin's API is much nicer than just low-level primitives like
> >> map(),
> >>>>>>> flatMap(), etc. Thus, they could really benefit from having a full
> >>>>> graph
> >>>>>>> query language already available for their users. (As a side note,
> >> its
> >>>>>>> really nice to see more and more systems use functional/fluent APIs
> >> as
> >>>>>> this
> >>>>>>> really trains the next generation to think like this which is
> >>> important
> >>>>>> as
> >>>>>>> Gremlin is purely this! Hopefully the SQL model of querying starts
> >> to
> >>>>>> look
> >>>>>>> odd to people in comparison.)
> >>>>>>>
> >>>>>>> I just sent out this tweet:
> >>>>>>>
> >> https://twitter.com/apachetinkerpop/status/668820458599530497
> >>>>>>>
> >>>>>>> If they seem positive, I can detail in JIRA what would be required
> >> for
> >>>>>>> them to have TinkerPop-support.
> >>>>>>>
> >>>>>>> Thanks again James,
> >>>>>>> Marko.
> >>>>>>>
> >>>>>>> http://markorodriguez.com
> >>>>>>>
> >>>>>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
> >> [email protected]
> >>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi -
> >>>>>>>>
> >>>>>>>> Apache Flink has a graph API named Gelly...
> >>>>>>>>
> >>>>>>>>
> >>>>>
> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
> >>>>>>>>
> >>>>>>>> ...and Flink's "dedicated support for iterative operations" should
> >>>>> pair
> >>>>>>>> well with Gremlin:
> >>>>>>>>
> >>>>>>>> https://flink.apache.org/features.html
> >>>>>>>>
> >>>>>>>> Has anyone dug into this yet?
> >>>>>>>>
> >>>>>>>> - James
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> James Thornton, *http://electricspeed.com <
> >> http://electricspeed.com
> >>>>>> *
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com
> >>> *
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Gremlin on Flink & Gelly?

Reply via email to