Re: [DISCUSSION] second release schedule and scope

DO YUNG YOON Tue, 01 Aug 2017 03:46:20 -0700

Updates on our second release scope and schedule.

Since Hwansung suggest to resolve tinkerpop related issue before second
release, I was working on the S2GRAPH-151, S2GRAPH-148.


Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done) and
S2GRAPH-148 has PR ready.

Please review https://github.com/apache/incubator-s2graph/pull/115 .

As this point, I think we are ready for second release.

Followings are issues I raised first.

1. provide provider optimization, we have none currently.
- S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId from
IndexProvider such as Lucene.
- Other optimization can be added on consecutive releases.

2. full text search predicate is not currently supported(as @echarles
pointed out)
- S2GRAPH-153 resolve this by using lucene as IndexProvider.
- g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId from
IndexProvider then actually lookup Storage for Edge/Vertex.
- IndexProvider interface currently not optimized for large amount of
documents hit, but this can be improved later.

3. provide gremlin plugin
- S2GRAPH-148 provide subproject call s2graph-gremlin which contains
S2GraphGremlinPlugin.
- After merging https://github.com/apache/incubator-s2graph/pull/115, users
can use gremlin-console to try out S2Graph.

4. make sure tinkerpop stack works correctly.
- S2GRAPH-148 make sure gremlin-conole is working properly.
- However, I found out it is too tedius to use scala code in
gremlin-console(groovy), so I think creating java client can improve
usability, but this also can be done later.

In summary, I have resolved tinkerpop related issues, not totally, but just
enough for others to try out.

I suggest to build our second release candidates at this point if there is
no objection.
I want to hear what others think.



On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <[email protected]> wrote:

> Thanks for your feedback. Here is my questions.
>
> 1. Release schedule:
> - Do you think we should wait until all issues with tinkerpop support
> resolved after?
>
> What others think about the release schedule?
>
> Should we wait until all of tinkerpop related issues resolving?
> Can you guys list up "must resolve" issues on our second release?
> The reason I mentioned index is I think it is the only one blocker issue
> from list for next release.
>
> 2. Full-Text search:
> - There would be 2 types of index support with variation(mixed/composite)
> - Graph-Index: s2graph do not have this type of index.
> - Composite-Index
> - Mixed-Index
> - Vertex-Centric-Index: s2graph do have this type of index.
>
>
> Since they are two different type of index, it is inevitable to provide
> them as separate option.
>
> I doubt there could be confusion between graph-index and
> vertex-centric-index and always clarify it on documentation.
>
> If we agree that graph index layer is necessary, then develop the features
> first, then see if there could be confusion and decide what to do to
> clarify it. I think you agree that graph-index is necessary addition on
> project(tell me if you don't).
>
> Continue on more details on index topic.
>
> Following is what titan provide and I think it would be nice if we can
> provide this in S2Graph so let me briefly explain. (I suggest read through
> http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are not
> familiar with notations)
>
> 1. composite
>
> Composite indexes retrieve vertices or edges by one or a (fixed)
> composition of multiple keys.
>
> this example is how user can create composite index on titan.
>
> ```
> mgmt.buildIndex('byNameAndAgeComposite',
> Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> mgmt.commit()
> ```
>
> then following traversal take benefit from `byNameComposite` index.
>
> ```
> g.V().has('age', 30).has('name', 'hercules')
> ```
>
> We can use HBase to store this index by creating row key as ("age", 30,
> "name", "hercules").
>
> ```
> g.V().has('name', 'hercules').has('age', 30)
> ```
>
> To answer above traveral, it seems to sort property key and value in
> composite index.
>
> we can also make partial composite index such as below.
>
> ```
> ("age", 30)
> ("name", "hercules")
> ```
>
> I am not sure if this is necessary. user can explicitly create above as
> seperate index such as 'byName', 'byAge'.
>
> One more suggestion is provide option to partition index, since there
> could be lots of vertices/edges that has specific value. for example,
> 'byCountryGender' index can contains lots of vertices/edges and it is
> problematic to store vertices/edges on same HBase region. we need to
> auto-partition theses into user specified number of partition by prefix
> salt. This is optimization step so can be revisited once we have
> functionality working.
>
> Note that composite index is only for comparing equality so following
> traversal can't take advantage of index.
>
> ```
> g.V().has('name', 'hercules').has('age', inside(20, 50))
> ```
>
> 2. mixed
>
> Mixed indexes retrieve vertices or edges by any combination of previously
> added property keys. full text search can be powered by mixed index, but it
> may slower than composite index since it include external index backend
> search(lucene, solr, elasticsearch, ...).
>
> this example is how user can create mixed index on titan.
>
> ```
>
> mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
> ```
> user can decide use tokenizer when search engine index(named search) by
> specifing Mapping(String or TEXT, default TEXT provide full text search).
>
> then following traversal take benefit from `nameAndAge` index.
>
> ```
> g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
> g.V().has('name', textContains('hercules'))
> g.V().has('age', lt(50))
> ```
>
> we can use elasticsearch/lucene/solr as index backend for this type of
> index and actual tasks can be splitted by as following.
>
> If there is no objection, then I will create index task and list above
> subtasks under it.
>
> One possible tasks list can be described as following.
>
> 1. Management Client:
> - add option to speficy index type on creating ServiceColumn/Label.
> 2. Storage:
> - add method to build mutation for storage backend when set of
> vertexs/edges are given.
> - add method to call index backend with built mutation.
> 3. Serializer/Deserializer:
> - serializer: when a edge/vertex is given, build SKeyValue which can be
> used by storage methods.
> - deserializer: when byte array is given, build a Vertex/Edge that can be
> used by storage methods.
> 4. ProviderOptimization
> - tinkerpop ask provider to translate given traversal into implementation
> specific functions.
> - not sure if this is necessary with my limited knowledge so far, but need
> to check once S2Graph internal provide composite/mixed index.
>
> Any feedback would be appreciated.
>
>
> On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <[email protected]> wrote:
>
>> Sorry for late reply.
>>
>> I think it is important to implement Tinkerpop in terms of functionality
>> of
>> S2Graph and for the activation of the community.
>> I agree with your suggestion to concentrate on tinkerpop implementation
>> issues in the second release.
>> In my opinion, the time of release is when the tinkerpop implementation
>> issue is cleaned up.
>>
>> And with regard to full text search...
>> If full-text search is supported, we expect that constraints that were
>> able
>> to traversal will disappear only if the vertex is known.
>> If supported, it would be better to leave it as a separate option to avoid
>> confusion with existing indexes.
>>
>> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <[email protected]> wrote:
>>
>> > I guess there is no objection on my suggestion, so I am going to try
>> list
>> > up issues in more detail while preparing 0.2.0 release on late this
>> month.
>> >
>> > Before list up above issues as task on jira, I want to discuss index in
>> > more details.
>> >
>> > Following is my understanding on index to support tinkerpop fully and
>> > efficiently
>> > - reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
>> >
>> > 1. graph index: traversal from a list of vertices or edges that are
>> > identified by their properties
>> >
>> > 2. vertex-centric index: traversal through vertices with many incident
>> > edges.
>> >
>> > I believe s2graph has vertex-centric index already, but it does not have
>> > graph index layer so full text predicate, and range search features in
>> > tinkerpop runs very inefficiently.
>> >
>> > For example, following traversal run full scan.
>> >
>> > - g.V().has('name', 'hercules')
>> > - g.E().has('reason', textContains('loves'))
>> >
>> > To support full tinkerpop features efficiently, we need to add graph
>> index
>> > layer and I want to discuss how we are going to achieve this. like
>> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
>> > external
>> > search engine totally make sense to me.
>> >
>> > I suggest to design index management interface first, since graph index
>> has
>> > never exist in S2Graph previously. then decision about index storage
>> > backend, implementation can be discussed in more detail(the other way
>> > around could also possible).
>> >
>> > Following is how user create index in s2graph currently.
>> >
>> > Management.createServiceColumn(
>> > serviceName = serviceName, columnName = "person", columnType =
>> "integer",
>> >     props = Seq(
>> >     Prop("name", "-", "string"),
>> >     Prop("age", "0", "integer"),
>> >     Prop("location", "-", "string")
>> >     )
>> > )
>> >
>> > management.createLabel(
>> > label = "bought",
>> >     srcServiceName = serviceName, srcColumnName = "person",
>> srcColumnType =
>> > "integer",
>> >     tgtServiceName = serviceName, tgtColumnName = "product",
>> tgtColumnType
>> > = "integer", idDirected = true,
>> >     serviceName = serviceName,
>> >     indices = Seq(
>> >     Index("PK", Seq("amount", "created_at")
>> >     ),
>> >     props = Seq(
>> >     Prop("amount", "0.0", "double"),
>> >     Prop("created_at", "2000-01-01", "string")
>> >     ),
>> >     consistencyLevel = "strong"
>> > )
>> >
>> > How we going to let user to create graph-index? Should we add extra
>> > parameters on existing methods, or provide separate methods?
>> >
>> >
>> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <[email protected]> wrote:
>> >
>> > > Hi folks.
>> > >
>> > > It's been for a while we released our first release.
>> > > It seems that needs for implementing tinkerpop interface has been
>> high,
>> > > but we have not finished it. I have been working on
>> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
>> > > recently merged it into master.
>> > >
>> > > I think Gremlin-core is tested, but following is what I think we have
>> to
>> > > improve for tinkerpop users to try out s2graph easily.
>> > >
>> > > 1. provide provider optimization, we have none currently.
>> > > 2. full text search predicate is not currently supported(as @echarles
>> > > pointed out)
>> > > 3. provide gremlin plugin
>> > > 4. make sure tinkerpop stack works correctly.
>> > >
>> > > Any help on above issues would be highly appreciated(help on any other
>> > > issue would be also highly appreciated).
>> > >
>> > > By the way, What I want to discuss is the schedule and what will be
>> > > included on our second release.
>> > >
>> > > I suggest to focus on integrate with tinkerpop on our second release.
>> It
>> > > would be best if we can address above issues by this month, but I
>> doubt
>> > if
>> > > it is possible.
>> > >
>> > > I am suggesting fix our release date on late this month, then focus on
>> > > above issues with high priority. if we can address them all, great,
>> but
>> > if
>> > > we can't, then release with version as much as we can deliver in time,
>> > then
>> > > move them on next next release so on.
>> > >
>> > > Want to hear what other folks think about focus and schedule on our
>> > second
>> > > release, and happy to volunteer as release manager for this time if
>> there
>> > > are no other volunteer.
>> > >
>> > > If there are other issues which anyone think to be included on next
>> > > release, please list them on this thread.
>> > >
>> > > Thanks
>> > >
>> > > DO YUNG YOON
>> > >
>> > >
>> >
>>
>

Re: [DISCUSSION] second release schedule and scope

Reply via email to