Re: Where to place "Spark + GemFire" connector.

William Markito Tue, 07 Jul 2015 14:57:06 -0700

Folks,

There is a lot of good and valuable points on this thread, however we need
to discuss some practical actions here and maybe even see what other
projects have already done during their incubation.


For example, Apache Zeppelin (incubating) is also dependent on Spark and
what they do is select which version of Spark you're going to build against.

https://github.com/apache/incubator-zeppelin/tree/master

Given that we don't yet have a single release, I'm not sure we should
already be that concerned about having and maintaining multiple sub
repositories or even sub projects.

That said, it doesn't mean we shouldn't be concerned about modularization,
it's just that we don't actually need yet to have releases of each
independent modules trying to catch-up with other projects release cycle
without having a release cycle for our project yet.

IHMO, whenever a Geode release happens we may need to decide if it's going
to support an specific Spark version or be built and support multiple
versions (like Zeppelin) -  That also may apply to HDFS since we also
support HDFS but we don't have HDFS integration as a separate repository
just in order to catch up with HDFS release cycle. It doesn't mean it
shouldn't be modularized and developed as a separate project under Geode.

IOW, I'd vote to keep everything in the same repository, as different
projects, with modularized code and dependencies and when time comes, after
a couple releases, if it makes sense, sure, break it in different
repositories or sub-projects that may grow by themselves.

To be honest, I'm not exactly sure why the modularization discussion has to
be so tied with repositories.  Spring or any other DI framework for example
allows you to write nice and decoupled code... not mention other
techniques...

My 0.0.2 cents (following semantic versioning :) )

~/William

On Tue, Jul 7, 2015 at 1:33 PM, John Blum <jb...@pivotal.io> wrote:

> Just a quick word on maintaining different (release) branches for main
> dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
> Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
> has to be this way for Apache Geode and Pivotal GemFire given the fork in
> the codebase the current disparity between sga2 and develop.
>
> However, this is not to say that the prior release branches will be
> maintained indefinitely.  In fact, they are only maintained back to a
> certain release of GemFire (currently, 7.0.2). So it looks a little
> something like this...
>
> SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
> SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
> SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
> SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).
>
> So, yes, that means I actively maintain 3-4 different versions of SDG,
> though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
> line.
>
> See *Spring Data GemFire *project page
> <http://projects.spring.io/spring-data-gemfire/> [0] for further details.
>
> You can also see the *Spring Data GemFire* GitHub project
> <https://github.com/spring-projects/spring-data-gemfire/releases> [1] for
> release branches and tags as well.
>
> -j
>
> [0] - http://projects.spring.io/spring-data-gemfire/
> [1] - https://github.com/spring-projects/spring-data-gemfire/releases
>
>
> On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith <dsm...@pivotal.io> wrote:
>
> > To support different versions of spark, wouldn't it be better to have a
> > single code base that has adapters for different versions of spark? It
> > seems like that would be better than maintaining several active branches
> > with semi-duplicate code.
> >
> > I do think it would be better to keep the geode spark connector in a
> > separate repository with a separate release cycle, for all of the reasons
> > outlined on this thread (don't bloat the geode codebase, modularity,
> etc.).
> > But I think there is also value in keeping it in the apache community and
> > managing it through the apache process. I'm not sure how "just put it on
> > github" would work out. Maybe it's just a matter of making it through the
> > pain of the restrictive incubation process until we can split this code
> > out. And in the mean time keeping it as loosely coupled as possible.
> >
> > -Dan
> >
> > On Tue, Jul 7, 2015 at 11:57 AM, John Blum <jb...@pivotal.io> wrote:
> >
> > > +1 - Bingo, that tis the question.
> > >
> > > Part of the answer lies in having planned, predictable and a consistent
> > > cadence of releases.
> > >
> > > E.g. the *Spring Data* project <http://projects.spring.io/spring-data/
> >
> > > [0]
> > > is an umbrella project managing 12 individual modules (e.g. SD... JPA,
> > > Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release
> train")
> > > which all are at different versions and all have different external,
> > > critical (driver) dependencies.  The only dependen(cy|cies) all SD
> > modules
> > > have in common is the version of the core *Spring Framework* and the
> > > version of Spring* Data Commons*.  Otherwise individual modules upgrade
> > > their "driver" dependencies at different cycles, possibly in different
> > > "release train", but only when the current release train is released
> > > (~every 4 weeks).  See SD Wiki
> > > <
> > >
> >
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
> > > >
> > > [1]
> > > for more details.
> > >
> > > [0] - http://projects.spring.io/spring-data/
> > > [1] -
> > >
> > >
> >
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
> > >
> > >
> > > On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase <gch...@pivotal.io>
> > wrote:
> > >
> > > > More important than easy to develop is easy to pick up and use.
> > > >
> > > > Improving the new user experience is something that needs attention
> > from
> > > > Geode.  How we develop and provide Spark integration needs to take
> this
> > > > into account.
> > > >
> > > > Once we are able to provide official releases, how can a user know
> and
> > > make
> > > > sure they are getting the correct plug-in version, and have
> relatively
> > up
> > > > to date support for latest Geode and Spark versions?
> > > >
> > > > That to me is the requirement we should be designing for first in our
> > > > development process.
> > > >
> > > > On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik <
> > ro...@shaposhnik.org>
> > > > wrote:
> > > >
> > > > > On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade <
> > > aging...@pivotal.io>
> > > > > wrote:
> > > > > > Agree...And thats the point...The connector code needs to catch
> up
> > > with
> > > > > > spark release train; if its part of Geode then the Geode releases
> > > needs
> > > > > to
> > > > > > happen as often as Spark release (along with other planned Geode
> > > > > release)...
> > > > >
> > > > > I don't think this is a realistic goal to have that many actively
> > > > > supported branches
> > > > > of Geode Spark connector.
> > > > >
> > > > > Look, I've been around Hadoop ecosystem for years. Nowhere the
> > problem
> > > of
> > > > > integration with upstream is as present as in Hadoop ecosystem
> > > > > (everything depends
> > > > > on everything else and everything evolves like crazy). I haven't
> > seen a
> > > > > single
> > > > > project in that ecosystem that would be able to support a blanket
> > > > statement
> > > > > like the above. May be Geode has resources that guys depending on
> > > > something
> > > > > like HBase simply don't have.
> > > > >
> > > > > Thanks,
> > > > > Roman.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Greg Chase
> > > >
> > > > Director of Big Data Communities
> > > > http://www.pivotal.io/big-data
> > > >
> > > > Pivotal Software
> > > > http://www.pivotal.io/
> > > >
> > > > 650-215-0477
> > > > @GregChase
> > > > Blog: http://geekmarketing.biz/
> > > >
> > >
> > >
> > >
> > > --
> > > -John
> > > 503-504-8657
> > > john.blum10101 (skype)
> > >
> >
>
>
>
> --
> -John
> 503-504-8657
> john.blum10101 (skype)
>



-- 

William Markito Oliveira
Enterprise Architect
-- For questions about Apache Geode, please write to
*dev@geode.incubator.apache.org
<dev@geode.incubator.apache.org>*

Re: Where to place "Spark + GemFire" connector.

Reply via email to