Re: How to do Partitions correctly in 3.3.0?

gmail Sat, 14 Jan 2017 10:07:50 -0800

Hi,

I am still struggling to get how this might work for Sqlg.

Postgresql has some support for partitioning.

Basically a table can be partition, either by specifying ranges for come
column(s) or listing values and then they do their magic.
Partitions are only defined logically at the meta level.

e.g.
Country -> Person -> Stuff
And let the country’s name be duplicated on the Person vertex.

I want to partition the graph by country. Countries are not predefined
but are runtime added to the graph.
Does this mean that when a new country is created the code must create a
new Partition?
With a |contains(element)| like |contains(element) {return
element.property("country").equals(this.country)}|

Ultimately will the gremlin query |g.V().hasLabel("Person").out()| run a
thread per “country”.

For this to work on Postgresql the given gremlin will have to become
|g.V().hasLabel("Person").has("country", "x").out()| where “x” is the
country specified per thread.
If all this happened the Postgresql will hit partitions correctly.

Thanks
Pieter

On Thu, Jan 12, 2017 at 7:17 , Marko Rodriguez [email protected]
<http://mailto:[email protected]> wrote:

> Hello, One of the things that I’m working on for TinkerPop 3.3.0 is
> the ability for any GraphComputer or GraphActors to work against any
> Graph. That is, TinkerGraphComputer over Neo4jGraph,
> SparkGraphComputer over TinkerGraph, AkkaGraphActors over HadoopGraph,
> etc. In order to do this, we needed the concept of Graph partitions. A
> Graph partition simply allows you to iterate all the vertices and
> edges out of a “partition” (subgraph of the full graph). Moreover, it
> has methods to check for the existence of an element in that
> partition.
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java>
> Next, we have the concept of a Partitioner which contains all the
> Partitions.
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java>
> The problem I’m struggling with right now is how to specify a
> Partitioner. For instance, right now Graph.partitioner() returns the
> “default partitioner” for the Graph. For TinkerGraph, this partitioner
> has one partition (single a TinkerGraph is single machine). However,
> it is possible to create splits so that you can thread processing —
> e.g. 5 concurrent threads processing a TinkerGraph in
> TinkerGraphComputer. How is this done? partitioner = new
> HashPartitioner(graph.partitioner(),5) This sorta sucks to have to
> start specifying programmatically such things. I was thinking it would
> be nice to have Graph.partitioner() do all the work. For instance:
> Graph.partitioner(Function<Partitioner,Partitioner> partitionerMaker)
> The above function would start with the “default partitioner” of the
> Graph and then create new partitioners from that. Now, we can have
> some “non-lambda” based default functions for people to use.
> Graph.partitioner(Maker.splits(5).centricity(Vertex.class))
> Anywho…trying to think it through so we have a nice clean API with
> serializable and Configuration(able) objects. Thoughts?, Marko.
> http://markorodriguez.com

Re: How to do Partitions correctly in 3.3.0?

Reply via email to