Re: [DISCUSSION] second release schedule and scope

Hwansung Yu Wed, 02 Aug 2017 20:34:32 -0700

I checkouted PR 115 (included daewon's commit) and executed the test.
All test were confirmed to have passed.


And then, I tried running gremlin example and this worked well.
As you might think, this is enough to release.

You did good job. This PR will make a great contribution to our community
as well.

Best Regards.

On Wed, Aug 2, 2017 at 3:27 PM, daewon <[email protected]> wrote:

> I agree with the above comments and I agree with the release.
>
> On Tue, Aug 1, 2017 at 7:45 PM DO YUNG YOON <[email protected]> wrote:
>
> > Updates on our second release scope and schedule.
> >
> > Since Hwansung suggest to resolve tinkerpop related issue before second
> > release, I was working on the S2GRAPH-151, S2GRAPH-148.
> >
> > Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done)
> and
> > S2GRAPH-148 has PR ready.
> >
> > Please review https://github.com/apache/incubator-s2graph/pull/115 .
> >
> > As this point, I think we are ready for second release.
> >
> > Followings are issues I raised first.
> >
> > 1. provide provider optimization, we have none currently.
> > - S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId
> from
> > IndexProvider such as Lucene.
> > - Other optimization can be added on consecutive releases.
> >
> > 2. full text search predicate is not currently supported(as @echarles
> > pointed out)
> > - S2GRAPH-153 resolve this by using lucene as IndexProvider.
> > - g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId from
> > IndexProvider then actually lookup Storage for Edge/Vertex.
> > - IndexProvider interface currently not optimized for large amount of
> > documents hit, but this can be improved later.
> >
> > 3. provide gremlin plugin
> > - S2GRAPH-148 provide subproject call s2graph-gremlin which contains
> > S2GraphGremlinPlugin.
> > - After merging https://github.com/apache/incubator-s2graph/pull/115,
> > users
> > can use gremlin-console to try out S2Graph.
> >
> > 4. make sure tinkerpop stack works correctly.
> > - S2GRAPH-148 make sure gremlin-conole is working properly.
> > - However, I found out it is too tedius to use scala code in
> > gremlin-console(groovy), so I think creating java client can improve
> > usability, but this also can be done later.
> >
> > In summary, I have resolved tinkerpop related issues, not totally, but
> just
> > enough for others to try out.
> >
> > I suggest to build our second release candidates at this point if there
> is
> > no objection.
> > I want to hear what others think.
> >
> >
> >
> > On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <[email protected]> wrote:
> >
> > > Thanks for your feedback. Here is my questions.
> > >
> > > 1. Release schedule:
> > > - Do you think we should wait until all issues with tinkerpop support
> > > resolved after?
> > >
> > > What others think about the release schedule?
> > >
> > > Should we wait until all of tinkerpop related issues resolving?
> > > Can you guys list up "must resolve" issues on our second release?
> > > The reason I mentioned index is I think it is the only one blocker
> issue
> > > from list for next release.
> > >
> > > 2. Full-Text search:
> > > - There would be 2 types of index support with
> variation(mixed/composite)
> > > - Graph-Index: s2graph do not have this type of index.
> > > - Composite-Index
> > > - Mixed-Index
> > > - Vertex-Centric-Index: s2graph do have this type of index.
> > >
> > >
> > > Since they are two different type of index, it is inevitable to provide
> > > them as separate option.
> > >
> > > I doubt there could be confusion between graph-index and
> > > vertex-centric-index and always clarify it on documentation.
> > >
> > > If we agree that graph index layer is necessary, then develop the
> > features
> > > first, then see if there could be confusion and decide what to do to
> > > clarify it. I think you agree that graph-index is necessary addition on
> > > project(tell me if you don't).
> > >
> > > Continue on more details on index topic.
> > >
> > > Following is what titan provide and I think it would be nice if we can
> > > provide this in S2Graph so let me briefly explain. (I suggest read
> > through
> > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are
> not
> > > familiar with notations)
> > >
> > > 1. composite
> > >
> > > Composite indexes retrieve vertices or edges by one or a (fixed)
> > > composition of multiple keys.
> > >
> > > this example is how user can create composite index on titan.
> > >
> > > ```
> > > mgmt.buildIndex('byNameAndAgeComposite',
> > > Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> > > mgmt.commit()
> > > ```
> > >
> > > then following traversal take benefit from `byNameComposite` index.
> > >
> > > ```
> > > g.V().has('age', 30).has('name', 'hercules')
> > > ```
> > >
> > > We can use HBase to store this index by creating row key as ("age", 30,
> > > "name", "hercules").
> > >
> > > ```
> > > g.V().has('name', 'hercules').has('age', 30)
> > > ```
> > >
> > > To answer above traveral, it seems to sort property key and value in
> > > composite index.
> > >
> > > we can also make partial composite index such as below.
> > >
> > > ```
> > > ("age", 30)
> > > ("name", "hercules")
> > > ```
> > >
> > > I am not sure if this is necessary. user can explicitly create above as
> > > seperate index such as 'byName', 'byAge'.
> > >
> > > One more suggestion is provide option to partition index, since there
> > > could be lots of vertices/edges that has specific value. for example,
> > > 'byCountryGender' index can contains lots of vertices/edges and it is
> > > problematic to store vertices/edges on same HBase region. we need to
> > > auto-partition theses into user specified number of partition by prefix
> > > salt. This is optimization step so can be revisited once we have
> > > functionality working.
> > >
> > > Note that composite index is only for comparing equality so following
> > > traversal can't take advantage of index.
> > >
> > > ```
> > > g.V().has('name', 'hercules').has('age', inside(20, 50))
> > > ```
> > >
> > > 2. mixed
> > >
> > > Mixed indexes retrieve vertices or edges by any combination of
> previously
> > > added property keys. full text search can be powered by mixed index,
> but
> > it
> > > may slower than composite index since it include external index backend
> > > search(lucene, solr, elasticsearch, ...).
> > >
> > > this example is how user can create mixed index on titan.
> > >
> > > ```
> > >
> > >
> > mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,
> Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).
> buildMixedIndex("search")
> > > ```
> > > user can decide use tokenizer when search engine index(named search) by
> > > specifing Mapping(String or TEXT, default TEXT provide full text
> search).
> > >
> > > then following traversal take benefit from `nameAndAge` index.
> > >
> > > ```
> > > g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
> > > g.V().has('name', textContains('hercules'))
> > > g.V().has('age', lt(50))
> > > ```
> > >
> > > we can use elasticsearch/lucene/solr as index backend for this type of
> > > index and actual tasks can be splitted by as following.
> > >
> > > If there is no objection, then I will create index task and list above
> > > subtasks under it.
> > >
> > > One possible tasks list can be described as following.
> > >
> > > 1. Management Client:
> > > - add option to speficy index type on creating ServiceColumn/Label.
> > > 2. Storage:
> > > - add method to build mutation for storage backend when set of
> > > vertexs/edges are given.
> > > - add method to call index backend with built mutation.
> > > 3. Serializer/Deserializer:
> > > - serializer: when a edge/vertex is given, build SKeyValue which can be
> > > used by storage methods.
> > > - deserializer: when byte array is given, build a Vertex/Edge that can
> be
> > > used by storage methods.
> > > 4. ProviderOptimization
> > > - tinkerpop ask provider to translate given traversal into
> implementation
> > > specific functions.
> > > - not sure if this is necessary with my limited knowledge so far, but
> > need
> > > to check once S2Graph internal provide composite/mixed index.
> > >
> > > Any feedback would be appreciated.
> > >
> > >
> > > On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <[email protected]>
> > wrote:
> > >
> > >> Sorry for late reply.
> > >>
> > >> I think it is important to implement Tinkerpop in terms of
> functionality
> > >> of
> > >> S2Graph and for the activation of the community.
> > >> I agree with your suggestion to concentrate on tinkerpop
> implementation
> > >> issues in the second release.
> > >> In my opinion, the time of release is when the tinkerpop
> implementation
> > >> issue is cleaned up.
> > >>
> > >> And with regard to full text search...
> > >> If full-text search is supported, we expect that constraints that were
> > >> able
> > >> to traversal will disappear only if the vertex is known.
> > >> If supported, it would be better to leave it as a separate option to
> > avoid
> > >> confusion with existing indexes.
> > >>
> > >> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <[email protected]>
> wrote:
> > >>
> > >> > I guess there is no objection on my suggestion, so I am going to try
> > >> list
> > >> > up issues in more detail while preparing 0.2.0 release on late this
> > >> month.
> > >> >
> > >> > Before list up above issues as task on jira, I want to discuss index
> > in
> > >> > more details.
> > >> >
> > >> > Following is my understanding on index to support tinkerpop fully
> and
> > >> > efficiently
> > >> > - reference:
> > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> > >> >
> > >> > 1. graph index: traversal from a list of vertices or edges that are
> > >> > identified by their properties
> > >> >
> > >> > 2. vertex-centric index: traversal through vertices with many
> incident
> > >> > edges.
> > >> >
> > >> > I believe s2graph has vertex-centric index already, but it does not
> > have
> > >> > graph index layer so full text predicate, and range search features
> in
> > >> > tinkerpop runs very inefficiently.
> > >> >
> > >> > For example, following traversal run full scan.
> > >> >
> > >> > - g.V().has('name', 'hercules')
> > >> > - g.E().has('reason', textContains('loves'))
> > >> >
> > >> > To support full tinkerpop features efficiently, we need to add graph
> > >> index
> > >> > layer and I want to discuss how we are going to achieve this. like
> > >> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> > >> > external
> > >> > search engine totally make sense to me.
> > >> >
> > >> > I suggest to design index management interface first, since graph
> > index
> > >> has
> > >> > never exist in S2Graph previously. then decision about index storage
> > >> > backend, implementation can be discussed in more detail(the other
> way
> > >> > around could also possible).
> > >> >
> > >> > Following is how user create index in s2graph currently.
> > >> >
> > >> > Management.createServiceColumn(
> > >> > serviceName = serviceName, columnName = "person", columnType =
> > >> "integer",
> > >> >     props = Seq(
> > >> >     Prop("name", "-", "string"),
> > >> >     Prop("age", "0", "integer"),
> > >> >     Prop("location", "-", "string")
> > >> >     )
> > >> > )
> > >> >
> > >> > management.createLabel(
> > >> > label = "bought",
> > >> >     srcServiceName = serviceName, srcColumnName = "person",
> > >> srcColumnType =
> > >> > "integer",
> > >> >     tgtServiceName = serviceName, tgtColumnName = "product",
> > >> tgtColumnType
> > >> > = "integer", idDirected = true,
> > >> >     serviceName = serviceName,
> > >> >     indices = Seq(
> > >> >     Index("PK", Seq("amount", "created_at")
> > >> >     ),
> > >> >     props = Seq(
> > >> >     Prop("amount", "0.0", "double"),
> > >> >     Prop("created_at", "2000-01-01", "string")
> > >> >     ),
> > >> >     consistencyLevel = "strong"
> > >> > )
> > >> >
> > >> > How we going to let user to create graph-index? Should we add extra
> > >> > parameters on existing methods, or provide separate methods?
> > >> >
> > >> >
> > >> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <[email protected]>
> > wrote:
> > >> >
> > >> > > Hi folks.
> > >> > >
> > >> > > It's been for a while we released our first release.
> > >> > > It seems that needs for implementing tinkerpop interface has been
> > >> high,
> > >> > > but we have not finished it. I have been working on
> > >> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April,
> then
> > >> > > recently merged it into master.
> > >> > >
> > >> > > I think Gremlin-core is tested, but following is what I think we
> > have
> > >> to
> > >> > > improve for tinkerpop users to try out s2graph easily.
> > >> > >
> > >> > > 1. provide provider optimization, we have none currently.
> > >> > > 2. full text search predicate is not currently supported(as
> > @echarles
> > >> > > pointed out)
> > >> > > 3. provide gremlin plugin
> > >> > > 4. make sure tinkerpop stack works correctly.
> > >> > >
> > >> > > Any help on above issues would be highly appreciated(help on any
> > other
> > >> > > issue would be also highly appreciated).
> > >> > >
> > >> > > By the way, What I want to discuss is the schedule and what will
> be
> > >> > > included on our second release.
> > >> > >
> > >> > > I suggest to focus on integrate with tinkerpop on our second
> > release.
> > >> It
> > >> > > would be best if we can address above issues by this month, but I
> > >> doubt
> > >> > if
> > >> > > it is possible.
> > >> > >
> > >> > > I am suggesting fix our release date on late this month, then
> focus
> > on
> > >> > > above issues with high priority. if we can address them all,
> great,
> > >> but
> > >> > if
> > >> > > we can't, then release with version as much as we can deliver in
> > time,
> > >> > then
> > >> > > move them on next next release so on.
> > >> > >
> > >> > > Want to hear what other folks think about focus and schedule on
> our
> > >> > second
> > >> > > release, and happy to volunteer as release manager for this time
> if
> > >> there
> > >> > > are no other volunteer.
> > >> > >
> > >> > > If there are other issues which anyone think to be included on
> next
> > >> > > release, please list them on this thread.
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > DO YUNG YOON
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSSION] second release schedule and scope

Reply via email to