I has been working on this issue for a while, and I finally opened PR that I believe the right direction( https://github.com/apache/incubator-s2graph/pull/112). Please review PR112 and give any feedback. Here are some important notes on this PR.
1. Data type of property value. checkout https://github.com/apache/incubator-s2graph/pull/112/files#diff-8caf8eace8a4d2a42e1b0279d531d286 basically, currently we are only support data type already supported by s2graph previously. more data type support is also possible, but on seperate issue later if necessary. 2. No notion of VertexProperty. Property is same on Vertex and Edge in S2Graph so we have to decide what's our S2VertexProperty would be. Are we going to support this or just say we can't provide it(for now or what). checkout https://github.com/apache/incubator-s2graph/pull/112/files#diff-b64b1af513f07d8e34fb498c7618cf67 currently, only Cardinality.single vertex property is supported. 3. Vertex Id: S2Graph use ServiceColumn + UserProvidedId as internal vertex Id. We need to decide how we are going to map ServiceColumn into tp3's Verte. Are we going to serialize/deserialize ServiceColumn into tp3's Vertex label or not? Not just about ServiceColumn but want to discuss further about what S2Graph are going to provide through tp3's interface and how. checkout https://github.com/apache/incubator-s2graph/pull/112/files#diff-95ac55266df22a798b8f3ac2d9298ead it basically specify how to serialize/deserialize S2Graph's VertexId/EdgeId into Tp3's id() method. Also here is how to run tp3 test suite. just run junit test on org.apache.s2graph.core.tinkerpop.structure.S2GraphStructureStandardTest/org.apache.s2graph.core.tinkerpop.process.S2GraphProcessStandardTest without any setup. There are lots of tests so it will take some time. One thing I found useful for debugging is setting environment variables GREMLIN_TESTS as test class name such as org.apache.tinkerpop.gremlin.structure.GraphTest, then it will run the one test case only. Also there are a few OptOuts on S2Graph. Most of them are there because I think it is not currently possible to pass such test cases. These are solely based on my knowledge so please ask anything if it seems inappropriate. Even though I believe that PR112 is valid implementation of tp3 interface, there are many more things remain. - TraversalStrategy: we do not have any provider optimization yet( http://tinkerpop.apache.org/docs/current/reference/#traversalstrategy). I think there are a few optimization we can provide. ex) `g.V(vid/v).outE` will lookup graph by vid/v then finally return all adjacent edges start from this vertex. current implementation in PR use Await to wait I/O request to storage backend for V(vid/v) one time, then after that Await S2Vertex.edges method for vertex fetched. this simply require 2 I/O request, and 2 Await. in S2Graph this query can be reduced to first create vertext to fetch in memory, then fire one I/O request to storage backend which I think efficient. above is very limited example, but just want to know what others think. - Global index: checkout http://markmail.org/message/2vn2bwrwh5zbeie4. While I was going this issue, I noticed S2Graph do not have index provider layer for global index. ex) `g.V().has("name", "marko")`. current implementation do not have global index provider, so it will fetch all vertex and then check if it has property name as 'mark'. check out http://tinkerpop.apache.org/docs/current/reference/#traversalstrategy. Basically, we need some layer that get traversal, then modify it using global index. how to build global index is listed on http://markmail.org/message/2vn2bwrwh5zbeie4. - GremlinPlugin(https://issues.apache.org/jira/browse/S2GRAPH-148) For user to try out S2Graph through tinkerpop APIs on Gremlin Console, Gremlin Server, I believe we should provide `S2GraphGremlinPlugin`. - OLAP(GraphComputer) support I have not gone through GraphComputer parts yet( http://tinkerpop.apache.org/docs/current/reference/#graphcomputer), but I think s2graph can take benefits from tp3's OLAP framework. All of above needs some help from community which is very limited currently. Please feel free to open issue/discussion on above or any other things we should think about. Best Regards. DOYUNG YOON On Thu, Nov 24, 2016 at 12:45 PM DO YUNG YOON <[email protected]> wrote: > Hi folks. > > After discussion at ApacheCon BigData Europe(sevile), I was wondering if > it is possible to change S2Graph's core library to implement tp3's > interface directly rather than providing layer atop of existing codebase. > > I have updated corresponding issue > <https://issues.apache.org/jira/browse/S2GRAPH-72> and create 2 sub tasks( > S2GRAPH-129 <https://issues.apache.org/jira/browse/S2GRAPH-129> , > S2GRAPH-130 <https://issues.apache.org/jira/browse/S2GRAPH-130> ) to try > out this idea. > > @committers, Please review PR99 > <https://github.com/apache/incubator-s2graph/pull/99>, PR100 > <https://github.com/apache/incubator-s2graph/pull/100> so we can proceed > to implement all interfaces of tp3 actually. I intentionally left actual > implementation omitted because it can be changed after this discussion. > > Apart from that, Here are few things I want to discuss regarding support > Apache TinkerPop and Gremlin. > > 1. Data type of property value. Currently S2Graph only support types > available on JSON. is this ok? are we going to support any other type? If > then, What need to be done to support other data type on property's value. > > 2. No notion of VertexProperty. Property is same on Vertex and Edge in > S2Graph so we have to decide what's our S2VertexProperty would be. Are we > going to support this or just say we can't provide it(for now or what). > > 3. Vertex Id: S2Graph use ServiceColumn + UserProvidedId as internal > vertex Id. We need to decide how we are going to map ServiceColumn into > tp3's Verte. Are we going to serialize/deserialize ServiceColumn into tp3's > Vertex label or not? Not just about ServiceColumn but want to discuss > further about what S2Graph are going to provide through tp3's interface and > how. > > Please feel free to comment on not only above but also anything regarding > to tp3 support in general. > > Thanks. > >
