I agree to integrate spark-related projects such as streaming and batch. Because the above projects have similar functionality, there is a lot of code duplication.
Project integration is expected to simplify project elimination and project redundancy. First, after integrating streaming and batch related projects, I would like to discuss about removing the 's2rest_netty' project for http layer integration. S2GRAPH-85 (https://issues.apache.org/jira/browse/S2GRAPH-85) had a bit of discussion about the http layer. On Mon, Nov 28, 2016 at 7:16 PM DO YUNG YOON <[email protected]> wrote: > Hi folks. > > I think we should discuss what we provide as subproject until next release. > > Since initial code imports to apache, we have not worked on other > subprojects except s2core, s2rest_play. > > Here is what you can find in each subproject(from our README). > > 1. s2core: The core library, containing the data abstractions for graph > entities, storage adapters and utilities. > 2. s2rest_play: The REST server built with Play framework > <https://www.playframework.com/>, providing the write and query API. > 3. s2rest_netty: The REST server built directly using Netty, > implementing only the query API. > 4. loader: A collection of Spark jobs for bulk loading streaming data > into S2Graph. > 5. spark: Spark utilities for loader and s2counter_loader. > 6. s2counter_core: The core library providing data structures and logics > for s2counter_loader. > 7. s2counter_loader: Spark streaming jobs that consume Kafka WAL logs > and calculate various top-*K* results on-the-fly. > > > I want to suggest to merge loader, spark, s2counter_loader into one project > called s2loader, make it responsible for streaming/batch utils to work with > S2Graph. > > The reason behind of this is improving codebase(we have lots of duplicate > codes currently and it seems quite abandoned). > > Also documentations are missed so we should provide firm documentation to > help others to understand them. > > Finally there is no specs and test cases. I think adding test cases is > important because we can start refactor our code to easily testable one. > > I have opened discussion thread at > http://markmail.org/message/3j2hbfquwwybyz4e but not enough attention has > been showed, so please give any feedback on this so we can start to work on > our subprojects. > > Thanks. >
