Re: [DISCUSS] TinkerPop 3.1.2 and 3.2.0 Planning

Ted Wilmes Tue, 02 Feb 2016 07:25:28 -0800

Hi guys,
Here are a few things I've been thinking about for 3.1.2 and 3.2.

TinkerPop 3.1.2
* TinkerPop-1016 - finish first set of JMH benchmarks
* TinkerPop-965 - optimize strategy application
* Study Ferma/TinkerPop benchmark, see where the performance deltas are
coming from, and create tickets as necessary


TinkerPop 3.2
* I'd like to familiarize myself with the OLAP side of things and hopefully
begin to help out a bit with those tickets.

Profiling results ultimately need to drive targeted performance
improvements but I've been thinking about experimenting with a few more
"out-there" ideas:
* Explore possibility of introducing code generation into certain steps to
cut down on traversal execution overhead.  Granted, the gains would need to
outweigh the cost of compilation of generated code.  Other folks have had
good success with this technique in certain scenarios.  See the code
generation portion of Spark's Tungsten project for one example [1].
* Take code generation one step further and explore the possibility of a
"code generation" strategy that given a traversal (maybe only simple ones
to start), the strategy will generate pure Java code which can then be
compiled on the fly and executed in a highly performant manner.

--Ted

[1]
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

On Mon, Feb 1, 2016 at 5:01 PM, Marko Rodriguez <[email protected]>
wrote:

> Hi,
>
> Please bring this up on the respective ticket and we can discuss there.
> This way we don't steal this thread from 3.1.2 and 3.2.0 planning.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
> On Feb 1, 2016, at 2:30 PM, Marvin Froeder <[email protected]> wrote:
>
> > Any plans on making the return methods generic so we can specialize them?
> >
> > For instance, instead of
> > public interface Graph {
> > public Iterator<Vertex> vertices(final Object... vertexIds);
> > }
> > to have
> > public interface Graph<V extends Vertex> {
> > public Iterator<V> vertices(final Object... vertexIds);
> > }
> >
> >
> > That way, orientdb-gremlin can expose custom operations and even enforce
> > types for things like Element.id() and many other creative thinking =D
> >
> >
> > On Tue, Feb 2, 2016 at 3:52 AM, Marko Rodriguez <[email protected]>
> > wrote:
> >
> >> Hi,
> >>
> >> I think 3.2.0 can include breaking changes if need be. However, I
> believe
> >> all the things that I want to do will be have @Deprecated backwards
> >> compatible solutions.
> >>
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Feb 1, 2016, at 4:25 AM, Stephen Mallette <[email protected]>
> wrote:
> >>
> >>> Is 3.2.0 going to be considered a "breaking" version in the sense that
> we
> >>> need to alter some APIs? or will it be possible to do 3.2.0 without
> that?
> >>> I'm in favor of a breaking version for 3.2.0 so that we can try to
> clean
> >> up
> >>> some old code especially if we have other changes driving that.
> >>>
> >>> On Sat, Jan 30, 2016 at 7:55 PM, Marko Rodriguez <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Hello Pieter,
> >>>>
> >>>>> A tad selfish I know,
> >>>>> but https://issues.apache.org/jira/browse/TINKERPOP-968 is what I am
> >>>>> waiting for.
> >>>>
> >>>> The things I listed are what I care about and what I plan to work on.
> If
> >>>> you have things you care about, you can work on those. If you are
> >> unsure of
> >>>> a development strategy, perhaps you can get others excited about your
> >> idea
> >>>> with a [DISCUSS], work through pros/cons, get some buy in, etc. From
> >> there,
> >>>> develop the idea, test it, document it, and ultimately provide a PR to
> >> get
> >>>> it merged into a release line.
> >>>>
> >>>>       http://tinkerpop.apache.org/docs/3.1.1-SNAPSHOT/dev/developer/
> >>>>
> >>>> SIDENOTE: A few people emailed me personally saying comments to the
> >>>> effect: "Please deliver X, Y, Z feature." Note, if you want something
> >> done,
> >>>> do it. If you don't know how to do it, learn it. If you don't know how
> >> to
> >>>> learn it, ask and we can point you in the right direction. If you
> don't
> >>>> know how to ask -- I know you are lying cause you asked me to deliver
> >> X, Y,
> >>>> Z. Gotcha!
> >>>>
> >>>> Take care,
> >>>> Marko.
> >>>>
> >>>> http://markorodriguez.com
> >>>>
> >>>>>
> >>>>> Cheers
> >>>>> Pieter
> >>>>>
> >>>>> On 30/01/2016 19:09, Marko Rodriguez wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> With TinkerPop 3.1.1 about to be put up for VOTE, we can start to
> turn
> >>>> our attentions towards 3.1.2 and 3.2.0.
> >>>>>>
> >>>>>> I was thinking it would be good to have a planning session to
> organize
> >>>> JIRA and discuss order of operations. However, JIRA planning sessions
> >> are a
> >>>> bit boring as they are too "nitty gritty," so perhaps we can use this
> >>>> thread to discuss what we (as individuals) would like to accomplish
> for
> >>>> 3.1.2 and 3.2.0 in general. This way, we have more summaries of
> >> everyone's
> >>>> desires and then the specifics can be shakin' out in JIRA. As such,
> here
> >>>> are my desires:
> >>>>>>
> >>>>>> TinkerPop 3.1.2
> >>>>>>    * Test a new shuffle optimization idea in SparkGraphComputer and
> >>>> if its efficient, use it.
> >>>>>>    * Benchmark GiraphGraphComputer at scale and optimize it where
> >>>> need be.
> >>>>>>
> >>>>>> TinkerPop 3.2.0
> >>>>>>    * Gremlin DSLs -- e.g.
> >>>>
> >>
> social.people().aged(36).who().know().person("daniel").who().worksFor().company("cisco")
> >>>>>>    * TraversalSource API redesign. g =
> >>>> graph.traversal().withComputer(…).withStrategy(…).withBulk(…). The
> >> current
> >>>> TraversalSourceBuilder model is horrible.
> >>>>>>    * OLTP/OLAP-mixed traversal -- e.g.
> >>>>
> >>
> OLAP[g.V().out()]OLTP[limit(10)]OLAP[out().values("name").order()]OLTP[sample(1)]
> >>>>>>    * GraphComputer API additions for intelligent data access -- e.g.
> >>>> g.V().count() does not need to grab all the edges of the graph!
> >>>>>>    * Bulking beyond Long -- support BigInteger, Complex numbers,
> >>>> Doubles, etc.
> >>>>>>    * Redesign TraverserRequirements -- this is a rats nest that
> >>>> didn't really work out as planned and its inefficient. I think I can
> >> make
> >>>> this a lot more simple.
> >>>>>>    * ServerGraph/ServerStep/ServerStrategy -- like OLAP, but for
> >>>> GremlinServer -- e.g. [GraphStep, VertexStep, ServerStep] (collaborate
> >> with
> >>>> GremlinServer people on this).
> >>>>>>    * Scope.local & Scope.global rethinking -- count(local),
> >>>> dedup(local) … too many -- this is not manageable! What about
> >>>> g.V().groupCount().inside(order().limit(10)) instead of
> >>>> g.V().groupCount().order(local).limit(local,10).
> >>>>>>    * Clean up HadoopGraph configurations -- Why do we have
> >>>> gremlin.spark.graphInputRDD and gremlin.hadoop.graphInputFormat. We
> >> should
> >>>> just have one configuration: gremlin.hadoop.graphInputClass.
> >>>>>>    * Publish a tutorial on the Gremlin VM and compiling other
> >>>> languages to it. I would really like to have the gremlin-examples/
> >> package
> >>>> that Jason/Stephen were talking about.
> >>>>>>    * Optimize Gryo serialization and SparkGraphComputer's
> >>>> GryoSerializer.
> >>>>>>
> >>>>>> Those are the big ticket items that I would like to get handle for
> the
> >>>> next versions of TinkerPop.
> >>>>>>
> >>>>>> What are your thoughts on these and what are your thoughts on what
> you
> >>>> plan to accomplish in this next push?
> >>>>>>
> >>>>>> Take care,
> >>>>>> Marko.
> >>>>>>
> >>>>>> http://markorodriguez.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSS] TinkerPop 3.1.2 and 3.2.0 Planning

Reply via email to