Seems like everyone agrees on second release and I am going to work on building release candidates this weekend.
Thanks for testing and giving feedbacks. On Thu, Aug 3, 2017 at 12:34 PM Hwansung Yu <[email protected]> wrote: > I checkouted PR 115 (included daewon's commit) and executed the test. > All test were confirmed to have passed. > > And then, I tried running gremlin example and this worked well. > As you might think, this is enough to release. > > You did good job. This PR will make a great contribution to our community > as well. > > Best Regards. > > On Wed, Aug 2, 2017 at 3:27 PM, daewon <[email protected]> wrote: > > > I agree with the above comments and I agree with the release. > > > > On Tue, Aug 1, 2017 at 7:45 PM DO YUNG YOON <[email protected]> wrote: > > > > > Updates on our second release scope and schedule. > > > > > > Since Hwansung suggest to resolve tinkerpop related issue before second > > > release, I was working on the S2GRAPH-151, S2GRAPH-148. > > > > > > Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done) > > and > > > S2GRAPH-148 has PR ready. > > > > > > Please review https://github.com/apache/incubator-s2graph/pull/115 . > > > > > > As this point, I think we are ready for second release. > > > > > > Followings are issues I raised first. > > > > > > 1. provide provider optimization, we have none currently. > > > - S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId > > from > > > IndexProvider such as Lucene. > > > - Other optimization can be added on consecutive releases. > > > > > > 2. full text search predicate is not currently supported(as @echarles > > > pointed out) > > > - S2GRAPH-153 resolve this by using lucene as IndexProvider. > > > - g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId > from > > > IndexProvider then actually lookup Storage for Edge/Vertex. > > > - IndexProvider interface currently not optimized for large amount of > > > documents hit, but this can be improved later. > > > > > > 3. provide gremlin plugin > > > - S2GRAPH-148 provide subproject call s2graph-gremlin which contains > > > S2GraphGremlinPlugin. > > > - After merging https://github.com/apache/incubator-s2graph/pull/115, > > > users > > > can use gremlin-console to try out S2Graph. > > > > > > 4. make sure tinkerpop stack works correctly. > > > - S2GRAPH-148 make sure gremlin-conole is working properly. > > > - However, I found out it is too tedius to use scala code in > > > gremlin-console(groovy), so I think creating java client can improve > > > usability, but this also can be done later. > > > > > > In summary, I have resolved tinkerpop related issues, not totally, but > > just > > > enough for others to try out. > > > > > > I suggest to build our second release candidates at this point if there > > is > > > no objection. > > > I want to hear what others think. > > > > > > > > > > > > On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <[email protected]> wrote: > > > > > > > Thanks for your feedback. Here is my questions. > > > > > > > > 1. Release schedule: > > > > - Do you think we should wait until all issues with tinkerpop support > > > > resolved after? > > > > > > > > What others think about the release schedule? > > > > > > > > Should we wait until all of tinkerpop related issues resolving? > > > > Can you guys list up "must resolve" issues on our second release? > > > > The reason I mentioned index is I think it is the only one blocker > > issue > > > > from list for next release. > > > > > > > > 2. Full-Text search: > > > > - There would be 2 types of index support with > > variation(mixed/composite) > > > > - Graph-Index: s2graph do not have this type of index. > > > > - Composite-Index > > > > - Mixed-Index > > > > - Vertex-Centric-Index: s2graph do have this type of index. > > > > > > > > > > > > Since they are two different type of index, it is inevitable to > provide > > > > them as separate option. > > > > > > > > I doubt there could be confusion between graph-index and > > > > vertex-centric-index and always clarify it on documentation. > > > > > > > > If we agree that graph index layer is necessary, then develop the > > > features > > > > first, then see if there could be confusion and decide what to do to > > > > clarify it. I think you agree that graph-index is necessary addition > on > > > > project(tell me if you don't). > > > > > > > > Continue on more details on index topic. > > > > > > > > Following is what titan provide and I think it would be nice if we > can > > > > provide this in S2Graph so let me briefly explain. (I suggest read > > > through > > > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are > > not > > > > familiar with notations) > > > > > > > > 1. composite > > > > > > > > Composite indexes retrieve vertices or edges by one or a (fixed) > > > > composition of multiple keys. > > > > > > > > this example is how user can create composite index on titan. > > > > > > > > ``` > > > > mgmt.buildIndex('byNameAndAgeComposite', > > > > Vertex.class).addKey(name).addKey(age).buildCompositeIndex() > > > > mgmt.commit() > > > > ``` > > > > > > > > then following traversal take benefit from `byNameComposite` index. > > > > > > > > ``` > > > > g.V().has('age', 30).has('name', 'hercules') > > > > ``` > > > > > > > > We can use HBase to store this index by creating row key as ("age", > 30, > > > > "name", "hercules"). > > > > > > > > ``` > > > > g.V().has('name', 'hercules').has('age', 30) > > > > ``` > > > > > > > > To answer above traveral, it seems to sort property key and value in > > > > composite index. > > > > > > > > we can also make partial composite index such as below. > > > > > > > > ``` > > > > ("age", 30) > > > > ("name", "hercules") > > > > ``` > > > > > > > > I am not sure if this is necessary. user can explicitly create above > as > > > > seperate index such as 'byName', 'byAge'. > > > > > > > > One more suggestion is provide option to partition index, since there > > > > could be lots of vertices/edges that has specific value. for example, > > > > 'byCountryGender' index can contains lots of vertices/edges and it is > > > > problematic to store vertices/edges on same HBase region. we need to > > > > auto-partition theses into user specified number of partition by > prefix > > > > salt. This is optimization step so can be revisited once we have > > > > functionality working. > > > > > > > > Note that composite index is only for comparing equality so following > > > > traversal can't take advantage of index. > > > > > > > > ``` > > > > g.V().has('name', 'hercules').has('age', inside(20, 50)) > > > > ``` > > > > > > > > 2. mixed > > > > > > > > Mixed indexes retrieve vertices or edges by any combination of > > previously > > > > added property keys. full text search can be powered by mixed index, > > but > > > it > > > > may slower than composite index since it include external index > backend > > > > search(lucene, solr, elasticsearch, ...). > > > > > > > > this example is how user can create mixed index on titan. > > > > > > > > ``` > > > > > > > > > > > mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name, > > Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()). > > buildMixedIndex("search") > > > > ``` > > > > user can decide use tokenizer when search engine index(named search) > by > > > > specifing Mapping(String or TEXT, default TEXT provide full text > > search). > > > > > > > > then following traversal take benefit from `nameAndAge` index. > > > > > > > > ``` > > > > g.V().has('name', textContains('hercules')).has('age', inside(20, > 50)) > > > > g.V().has('name', textContains('hercules')) > > > > g.V().has('age', lt(50)) > > > > ``` > > > > > > > > we can use elasticsearch/lucene/solr as index backend for this type > of > > > > index and actual tasks can be splitted by as following. > > > > > > > > If there is no objection, then I will create index task and list > above > > > > subtasks under it. > > > > > > > > One possible tasks list can be described as following. > > > > > > > > 1. Management Client: > > > > - add option to speficy index type on creating ServiceColumn/Label. > > > > 2. Storage: > > > > - add method to build mutation for storage backend when set of > > > > vertexs/edges are given. > > > > - add method to call index backend with built mutation. > > > > 3. Serializer/Deserializer: > > > > - serializer: when a edge/vertex is given, build SKeyValue which can > be > > > > used by storage methods. > > > > - deserializer: when byte array is given, build a Vertex/Edge that > can > > be > > > > used by storage methods. > > > > 4. ProviderOptimization > > > > - tinkerpop ask provider to translate given traversal into > > implementation > > > > specific functions. > > > > - not sure if this is necessary with my limited knowledge so far, but > > > need > > > > to check once S2Graph internal provide composite/mixed index. > > > > > > > > Any feedback would be appreciated. > > > > > > > > > > > > On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <[email protected]> > > > wrote: > > > > > > > >> Sorry for late reply. > > > >> > > > >> I think it is important to implement Tinkerpop in terms of > > functionality > > > >> of > > > >> S2Graph and for the activation of the community. > > > >> I agree with your suggestion to concentrate on tinkerpop > > implementation > > > >> issues in the second release. > > > >> In my opinion, the time of release is when the tinkerpop > > implementation > > > >> issue is cleaned up. > > > >> > > > >> And with regard to full text search... > > > >> If full-text search is supported, we expect that constraints that > were > > > >> able > > > >> to traversal will disappear only if the vertex is known. > > > >> If supported, it would be better to leave it as a separate option to > > > avoid > > > >> confusion with existing indexes. > > > >> > > > >> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <[email protected]> > > wrote: > > > >> > > > >> > I guess there is no objection on my suggestion, so I am going to > try > > > >> list > > > >> > up issues in more detail while preparing 0.2.0 release on late > this > > > >> month. > > > >> > > > > >> > Before list up above issues as task on jira, I want to discuss > index > > > in > > > >> > more details. > > > >> > > > > >> > Following is my understanding on index to support tinkerpop fully > > and > > > >> > efficiently > > > >> > - reference: > > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html > > > >> > > > > >> > 1. graph index: traversal from a list of vertices or edges that > are > > > >> > identified by their properties > > > >> > > > > >> > 2. vertex-centric index: traversal through vertices with many > > incident > > > >> > edges. > > > >> > > > > >> > I believe s2graph has vertex-centric index already, but it does > not > > > have > > > >> > graph index layer so full text predicate, and range search > features > > in > > > >> > tinkerpop runs very inefficiently. > > > >> > > > > >> > For example, following traversal run full scan. > > > >> > > > > >> > - g.V().has('name', 'hercules') > > > >> > - g.E().has('reason', textContains('loves')) > > > >> > > > > >> > To support full tinkerpop features efficiently, we need to add > graph > > > >> index > > > >> > layer and I want to discuss how we are going to achieve this. like > > > >> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) > using > > > >> > external > > > >> > search engine totally make sense to me. > > > >> > > > > >> > I suggest to design index management interface first, since graph > > > index > > > >> has > > > >> > never exist in S2Graph previously. then decision about index > storage > > > >> > backend, implementation can be discussed in more detail(the other > > way > > > >> > around could also possible). > > > >> > > > > >> > Following is how user create index in s2graph currently. > > > >> > > > > >> > Management.createServiceColumn( > > > >> > serviceName = serviceName, columnName = "person", columnType = > > > >> "integer", > > > >> > props = Seq( > > > >> > Prop("name", "-", "string"), > > > >> > Prop("age", "0", "integer"), > > > >> > Prop("location", "-", "string") > > > >> > ) > > > >> > ) > > > >> > > > > >> > management.createLabel( > > > >> > label = "bought", > > > >> > srcServiceName = serviceName, srcColumnName = "person", > > > >> srcColumnType = > > > >> > "integer", > > > >> > tgtServiceName = serviceName, tgtColumnName = "product", > > > >> tgtColumnType > > > >> > = "integer", idDirected = true, > > > >> > serviceName = serviceName, > > > >> > indices = Seq( > > > >> > Index("PK", Seq("amount", "created_at") > > > >> > ), > > > >> > props = Seq( > > > >> > Prop("amount", "0.0", "double"), > > > >> > Prop("created_at", "2000-01-01", "string") > > > >> > ), > > > >> > consistencyLevel = "strong" > > > >> > ) > > > >> > > > > >> > How we going to let user to create graph-index? Should we add > extra > > > >> > parameters on existing methods, or provide separate methods? > > > >> > > > > >> > > > > >> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <[email protected]> > > > wrote: > > > >> > > > > >> > > Hi folks. > > > >> > > > > > >> > > It's been for a while we released our first release. > > > >> > > It seems that needs for implementing tinkerpop interface has > been > > > >> high, > > > >> > > but we have not finished it. I have been working on > > > >> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, > > then > > > >> > > recently merged it into master. > > > >> > > > > > >> > > I think Gremlin-core is tested, but following is what I think we > > > have > > > >> to > > > >> > > improve for tinkerpop users to try out s2graph easily. > > > >> > > > > > >> > > 1. provide provider optimization, we have none currently. > > > >> > > 2. full text search predicate is not currently supported(as > > > @echarles > > > >> > > pointed out) > > > >> > > 3. provide gremlin plugin > > > >> > > 4. make sure tinkerpop stack works correctly. > > > >> > > > > > >> > > Any help on above issues would be highly appreciated(help on any > > > other > > > >> > > issue would be also highly appreciated). > > > >> > > > > > >> > > By the way, What I want to discuss is the schedule and what will > > be > > > >> > > included on our second release. > > > >> > > > > > >> > > I suggest to focus on integrate with tinkerpop on our second > > > release. > > > >> It > > > >> > > would be best if we can address above issues by this month, but > I > > > >> doubt > > > >> > if > > > >> > > it is possible. > > > >> > > > > > >> > > I am suggesting fix our release date on late this month, then > > focus > > > on > > > >> > > above issues with high priority. if we can address them all, > > great, > > > >> but > > > >> > if > > > >> > > we can't, then release with version as much as we can deliver in > > > time, > > > >> > then > > > >> > > move them on next next release so on. > > > >> > > > > > >> > > Want to hear what other folks think about focus and schedule on > > our > > > >> > second > > > >> > > release, and happy to volunteer as release manager for this time > > if > > > >> there > > > >> > > are no other volunteer. > > > >> > > > > > >> > > If there are other issues which anyone think to be included on > > next > > > >> > > release, please list them on this thread. > > > >> > > > > > >> > > Thanks > > > >> > > > > > >> > > DO YUNG YOON > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >
