Re: [DISCUSS] S2Graph Incubator Proposal

Andrew Purtell Mon, 09 Nov 2015 11:51:13 -0800

If you are looking for mentors let me volunteer as one.

I think S2Graph has the potential to be a good addition to the Apache
family given its relationships and dependencies with other Apache projects
from the outset.




On Mon, Nov 9, 2015 at 10:54 AM, Hyunsik Choi <hyun...@apache.org> wrote:

> This project is looking for mentors. Anyone can help? We are also
> looking forward to any feedback.
>
> Also, I attached the proposal here. I forgot it.
>
> ----------------
>
> = S2Graph Proposal =
>
> == Abstract ==
> S2Graph is a distributed and scalable OLTP graph database built on
> HBase to support fast traversal on extremely large graph.
>
> Here are additional materials to introduce S2Graph.
>  * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-5
>  * Apache: Big Data 2015 -
> http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
>
> == Proposal ==
> S2Graph is to provide a scalable distributed graph database engine
> over key/value storage such as HBase. S2Graph provide fully
> ashynchronous API to manupulate data as property graph model and fast
> breadth first search query on graph.
>
> == Background ==
> S2Graph initially started as an internal project at Kakao.com to
> efficiently store user relation and user activities as one large graph
> and provide unified query to traverse graph. It was open sourced on
> Github about a 3 months ago in June 2015.
>
> Over time S2Graph, together with HBase as storage tier, has begun to
> be adapted into various applications, such as messaging, social feeds,
> realtime recommendations at Kakao.
>
> Users can benefit from S2Graph`s generalized high level API instead of
> low-level key/value API for graph abstraction, just like Phoenix
> provide SQL layer over HBase.
>
> == Rationale ==
> Graph data(highly interconnected data) is very abundant and important
> these days.
> When users have a multitude of relationships, each with complex
> properties associated with them, graph model is more intuitive and
> efficient than tabular format(RDBMS).
> There are many ASF projects that provide SQL layer, but there is no
> ASF projects that provide scalable graph layer on existing hadoop echo
> system.
> When graph data grows to trillion edge scale, the process of
> traversing takes a long time and costly. However, with the benefit of
> HBase`s scalable architecture, S2Graph can traverse large graph in
> breadth first search manner efficiently.
>
> S2Graph also interoperates with several existing Apache
> projects(HBase, Spark) to provide way to merge real time events and
> batch processed data using property graph data model.
>
> Many developers are running their own domain specific API servers to
> serve their data products, but graph model is general and S2Graph API
> fully support traverse on graph, so it can be used as scalable general
> purpose API serving layer for various domains.
> As long as data can be modeled as graph, then users can avoid tedious
> work for developing customized API servers by using S2Graph.
>
> == Initial Goals ==
> The initial goals will be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is
> accomplished, we plan for incremental development and releases that
> follow the Apache guidelines.
>
> == Current Status ==
>
> === Meritocracy ===
> S2Graph operated on meritocratic principles from the get go.
> Currently, all the discussions pertaining to S2Graph development are
> public on Github. The current incubation
> proposal includes the major code contributors to S2Graph. Several
> additional people have worked on the S2graph codebase for industry use
> cases and would be interested in becoming committers. We are starting
> with a small committer group and we plan to add additional committers
> following an open merit-based decision process during the incubation
> phase.
>
> === Community ===
> We have already begun building a community but at this time the
> community consists only of S2Graph developers – all Kakao employees –
> and prospective users.
> S2Graph seeks to develop developer and user communities during incubation.
>
> === Core Developers ===
> S2Graph is currently being designed and developed by 2 engineers from
> Kakao. - Doyung Yoon, Deawon Jeong.
>
> === Alignment ===
> Our proposed S2Graph effort aligns closely with Apache HBase. The
> HBase project perimeter is denoted by a simple byte-array based
> Create, Read, Update, Delete and Scan APIs with no current plans to
> extend beyond this bounds.
>
> S2Graph complements this with a higher level API for property graph model.
>
> S2Graph was designed to offer scalable distributed graph database skin
> over HBase from the beginning in order to provide property graph model
> and breadth first search, and continue to focus on providing graph
> model.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers of S2Graph team plan to work full time on this
> project. There is very little risk of S2Graph getting orphaned since
> at least one large company (Kakao) is extensively using it in their
> production HBase clusters. For example, currently there are 20+ use
> cases with more than 1+Trillion edges and 140 million breadth first
> search query requests per minute using S2Graph in production.
> We plan to extend and diversify this community further through Apache.
>
> === Inexperience with Open Source ===
> The core developers are all active users and followers of open source.
> They are already committers and contributors to the S2Graph Github
> project. All have been involved with the source code that has been
> released under an open source license. Though the core set of
> Developers do not have Apache Open Source experience, there are plans
> to onboard individuals with Apache open source experience on to the
> project.
>
> === Homogenous Developers ===
> Most committers in this proposal belong to the same institution
> (Kakao). The engagement of these committers goes well beyond the
> necessary development to support research, and all committers work on
> S2Graph full time.
> Several people from other institutions are working on and are familiar
> with the S2Graph codebase. We will work to attract them as future
> committers during the incubation phase, following a merit-based
> approach.
>
> === Reliance on Salaried Developers ===
> Kakao invested in S2Graph as the distributed graph database solution
> on top of HBase and some of its key engineers are working full time on
> the project.
> We look forward to other Apache developers and researchers to
> contribute to the project.
> Also key to addressing the risk associated with relying on Salaried
> developers from a single entity is to increase the diversity of the
> contributors and actively lobby for Domain experts in the graph
> database space to contribute. Apache S2Graph intends to do this.
>
> === Relationships with Other Apache Products ===
> S2Graph has a strong relationship and dependency with Apache Hadoop
> HBase and Spark.
> Being part of Apache’s Incubation community, could help with a closer
> collaboration among these two projects and as well as others.
>
> In terms of graph processing frameworks, S2Graph and Apache Giraph
> look similar. However, their goals are apparently different to each
> other. Giraph aims at analytical batch processing on immutable graph
> data sets. In contrast, S2Graph is designed for OLTP-like workloads on
> graph data sets, and S2Graph provides INSERT/UPDATE operations too.
>
>
> === An Excessive Fascination with the Apache Brand ===
> S2Graph is proposing to enter incubation at Apache in order to help
> efforts to diversify the committer-base, not so much to capitalize on
> the Apache brand. The S2Graph project is in production use already
> inside Kakao, but is not expected to be an Kakao product for external
> customers. As such, the S2Graph project is not seeking to use the
> Apache brand as a marketing tool.
>
> == Documentation ==
> Information about S2Graph can be found at
> https://github.com/kakao/s2graph. The following links provide more
> information about S2Graph in open source:
>  * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/
>  * Codebase at Github: https://github.com/kakao/s2graph
>  * Issue Tracking: https://github.com/kakao/s2graph/issues
>  * User community: https://groups.google.com/forum/#!forum/s2graph
>
> == Initial Source ==
>
> The S2Graph codebase is currently hosted on Github:
> https://github.com/kakao/s2graph
>
> === Source and Intellectual Property Submission Plan ===
>
> Currently, the S2Graph codebase is distributed under the Apache 2.0
> License.
>
> == External Dependencies ==
>
> Beyond relying on Apache HBase, Phoenix has the following external
> dependencies:
>  * Asynchbase (BSD license: http://www.antlr3.org/license.html)
>  * Mysql (BSD license:
> https://github.com/julianhyde/sqlline/blob/master/LICENSE)
>  * Play Framework (Apache 2.0 license:
> https://github.com/playframework/playframework)
>  * Scala (https://github.com/scala/scala)
>  * Spark
>  * Kafka
>
> == Required Resources ==
>
> === Mailing list ===
>
> We will migrate our mailing lists to the following:
>  * us...@s2graph.incubator.apache.org
>  * d...@s2graph.incubator.apache.org
>  * priv...@s2graph.incubator.apache.org
>  * comm...@s2graph.incubator.apache.org
>
> === Source control ===
>
> The S2Graph team would like to use Git for source control, due to our
> current use of Git. We request a writeable Git repo for S2Graph, and
> mirroring to be set up to Github through INFRA.
>
> === Issue Tracking ===
>
> S2Graph currently uses the github issue tracking system associated
> with its github repo: https://github.com/kakao/s2graph/issues. We will
> migrate to the Apache JIRA:
> http://issues.apache.org/jira/browse/S2Graph
>
> === Other Resources ===
>
> Jenkins/Hudson for builds and test running.
> Wiki for documentation purposes
> Blog to improve project dissemination
>
> == Initial Committers ==
>
>  * Doyung Yoon <shom83 at gmail.com>
>  * Daewon Jeong <blueiur at gmail.com>
>  * Jaesang Kim <honeysleep at gmail.com>
>  * Hwansung Yu <deejayfwan at gmail.com>
>  * Min-Seok Kim <mskim.org at gmail.com>
>  * Chul Kang <miralchul at gmail.com>
>
> == Affiliations ==
>
> The initial committers are from one organizations: Kakao.
>  * Doyung Yoon, Kakao
>  * Daewon Jeong, Kakao
>  * Jaesang Kim, Kakao
>  * Hwansung Yu, Kakao
>  * Min-Seok Kim, Kakao
>  * Chul Kang, Kakao
>
> == Sponsors ==
>
> === Champion ===
> Hyunsik Choi
>
> === Nominated Mentors ===
>
> === Sponsoring Entity ===
>
>  * The Apache Incubator
>
> On Fri, Nov 6, 2015 at 4:05 PM, Hyunsik Choi <hyun...@apache.org> wrote:
> > Hi Seetharam,
> >
> > Thank you for a good question. That seem to be a frequent question to
> > this project.
> >
> > Here is the answer to your question.
> >
> https://steamshon.gitbooks.io/s2graph-book/content/what_is_different_to_titan.html
> >
> > I hope that this link is helpful to your understanding.
> >
> > Best regards,
> > Hyunsik
> >
> >
> >
> > On Fri, Nov 6, 2015 at 3:07 PM, Seetharam Venkatesh
> > <venkat...@innerzeal.com> wrote:
> >> Hi Hyunsik,
> >>
> >> The proposal looks interesting and want to know how is this different
> from
> >> existing solutions in the same space such as Titan, etc.
> >>
> >> Thanks!
> >> Venkatesh
> >>
> >>
> >> On Fri, Nov 6, 2015 at 1:36 PM Hyunsik Choi <hyun...@apache.org> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> We would like to start a discussion on S2Graph as an incubation
> project.
> >>>
> >>> S2Graph is a distributed and scalable OLTP graph database built on
> >>> HBase. It provides interactive queries for vertex/edge/sub-graphs on
> >>> extremely large graph data sets as well as insertion and update
> >>> operations.
> >>>
> >>> S2Graph was already introduced in Apache BigData and HBaseCon this
> year.
> >>>
> >>> The proposal is available at :
> >>> https://wiki.apache.org/incubator/S2GraphProposal
> >>>
> >>> We are looking forward to any feedback. In addition, we are looking
> >>> for volunteers as mentors.
> >>>
> >>> Best regards,
> >>> Hyunsik
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >>> For additional commands, e-mail: general-h...@incubator.apache.org
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] S2Graph Incubator Proposal

Reply via email to