Hi Vasia, I had started tinkering on it in my spare time in a separate repo. There really isn't much to collaborate on at this point. I was mostly trying to understand the parallels between Flink and Spark so that I could understand how a FlinkGraphComputer could be implemented given what I'd seen of the Spark implementation Marko did. I had expected to contribute the work to Flink (rather than keep it here on the TinkerPop side). Anyway, not much else to offer - Marko can probably get you running much faster than I can, as that area is where he holds the most expertise. You should probably keep an eye out for his comments.
On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]> wrote: > Hi James and TinkerPop community, > > thanks a lot for starting this discussion! > I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet you ;) > > I'm only starting to get familiar with the TinkerPop project, but it seems > that it can play well with Flink. > As you already noticed, a FlinkGraphComputer should be straight-forward to > implement. Gelly has a vertex-centric API that is similar to the > scatter-gather model [1] and a gather-sum-apply API [2] that is closer to > the Powergraph model. These are built on top of Flink's delta iteration > operators, which are more generic and could also be used directly for the > FlinkGraphComputer, if the existing Gelly abstractions won't work. > > Regarding the difference between stream and batch in Flink. Flink is a > streaming dataflow engine, on top of which you can run both streaming and > batch jobs. A batch job is simply seen by Flink as a job operating on a > finite stream. Respectively, Flink has a stream and a batch API. Gelly is > currently built on top of the batch API, i.e. the DataSet API. > > James mentioned in the Flink mailing list that someone has already started > working on a FlinkGraphComputer. Is there a JIRA for this? Let me know if > you have questions or you think I can help in some way! > > Cheers, > -Vasia. > > [1]: > > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations > [2]: > > https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations > [3]: > > https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator > > On 25 November 2015 at 17:05, James Thornton <[email protected]> > wrote: > > > Hi Vasia - > > > > Welcome to TinkerPop (linking you into the Flink thread as requested)... > > > > - James > > > > On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <[email protected]> > > wrote: > > > > > Hi James, > > > > > > Thank you for always having a ear to the tech pulse. If it wasn't for > > you, > > > I would still be excited about XMPP and would be programming in Tcl/Tk. > > > > > > Given my 20 minute review of their docs …… It would be cool if like the > > > "Table API," they also had a "Graph API" that was just TinkerPop > > > Graph/Vertex/Edge. That could be super intrusive, so as a simple step > -- > > > they already have a "vertex-centric" API and thus, having a > > > FlinkGraphComputer implementation seems "easy." Then from there, > Gremlin > > > should just work. I don't really understand the difference between > steam > > > and batch unless they are talking the difference between "Storm" and > > > "MapReduce." ? Would be cool to see how TinkerPop fits into the > > > stream-scene. > > > > > > Next, their fluent API is similar to Spark's and I would argue that > > > Gremlin's API is much nicer than just low-level primitives like map(), > > > flatMap(), etc. Thus, they could really benefit from having a full > graph > > > query language already available for their users. (As a side note, its > > > really nice to see more and more systems use functional/fluent APIs as > > this > > > really trains the next generation to think like this which is important > > as > > > Gremlin is purely this! Hopefully the SQL model of querying starts to > > look > > > odd to people in comparison.) > > > > > > I just sent out this tweet: > > > https://twitter.com/apachetinkerpop/status/668820458599530497 > > > > > > If they seem positive, I can detail in JIRA what would be required for > > > them to have TinkerPop-support. > > > > > > Thanks again James, > > > Marko. > > > > > > http://markorodriguez.com > > > > > > On Nov 19, 2015, at 12:19 PM, James Thornton <[email protected]> > > > wrote: > > > > > > > Hi - > > > > > > > > Apache Flink has a graph API named Gelly... > > > > > > > > > https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html > > > > > > > > ...and Flink's "dedicated support for iterative operations" should > pair > > > > well with Gremlin: > > > > > > > > https://flink.apache.org/features.html > > > > > > > > Has anyone dug into this yet? > > > > > > > > - James > > > > > > > > > > > > -- > > > > James Thornton, *http://electricspeed.com <http://electricspeed.com > >* > > > > > > > > > > > > -- > > James Thornton, *http://electricspeed.com <http://electricspeed.com>* > > >
