Re: Where to place "Spark + GemFire" connector.

John Blum Tue, 07 Jul 2015 16:12:26 -0700

For clarification... what I specifically mean when I say "level of
modularity" can be reflected in the dependencies between modules. The POM
distinguishes required vs. non-required dependencies based on the "scope"
(i.e. 'compile'-time vs. 'optional', and so on).


If you look at the Maven POM files in Spring, you will begin to understand
how bits and pieces of the framework can be used independently from the
other pieces, how features are only enabled if certain classes are detected
on the classpath, etc.  E.g. I can use Spring DI independently of the
Spring Transaction Management infrastructure or vice versa; those 2 are
quite unrelated actually but the 2 compliment each other when combined
along with AOP, for instance.

By way of example, "persistence" is one concern that could be modularized
and pluggable (write to oplog, write to HDFS, or forgo both and write to
underlying RDMBS, whatever), and enabled based on adding the corresponding
JAR for the desired behavior.

Anyway, food for thought.

-j


On Tue, Jul 7, 2015 at 3:58 PM, John Blum <jb...@pivotal.io> wrote:

> There are a few Spring projects that are exemplary (examples) in their
> modularity, contained within a single repo.  The core Spring Framework and
> Spring Boot are 2 such projects that immediately come to mind.
>
> However, this sort of disciplined modularity requires a very important
> delineation of responsibilities / separation of concerns reflected in the
> organization (and cleanliness) of the codebase combined with a very
> well-understood set of principles and practices to ensure this level of
> modularity is maintained; Geode is none of these things at the moment.  I
> echo Kirk's early concerns about build times and testing, etc, not to
> mention the gravity surrounding what is pertinent and what is not in order
> to contribute to the "core" of Geode.
>
> -j
>
> On Tue, Jul 7, 2015 at 2:55 PM, William Markito <wmark...@pivotal.io>
> wrote:
>
>> Folks,
>>
>> There is a lot of good and valuable points on this thread, however we need
>> to discuss some practical actions here and maybe even see what other
>> projects have already done during their incubation.
>>
>> For example, Apache Zeppelin (incubating) is also dependent on Spark and
>> what they do is select which version of Spark you're going to build
>> against.
>>
>> https://github.com/apache/incubator-zeppelin/tree/master
>>
>> Given that we don't yet have a single release, I'm not sure we should
>> already be that concerned about having and maintaining multiple sub
>> repositories or even sub projects.
>>
>> That said, it doesn't mean we shouldn't be concerned about modularization,
>> it's just that we don't actually need yet to have releases of each
>> independent modules trying to catch-up with other projects release cycle
>> without having a release cycle for our project yet.
>>
>> IHMO, whenever a Geode release happens we may need to decide if it's going
>> to support an specific Spark version or be built and support multiple
>> versions (like Zeppelin) -  That also may apply to HDFS since we also
>> support HDFS but we don't have HDFS integration as a separate repository
>> just in order to catch up with HDFS release cycle. It doesn't mean it
>> shouldn't be modularized and developed as a separate project under Geode.
>>
>> IOW, I'd vote to keep everything in the same repository, as different
>> projects, with modularized code and dependencies and when time comes,
>> after
>> a couple releases, if it makes sense, sure, break it in different
>> repositories or sub-projects that may grow by themselves.
>>
>> To be honest, I'm not exactly sure why the modularization discussion has
>> to
>> be so tied with repositories.  Spring or any other DI framework for
>> example
>> allows you to write nice and decoupled code... not mention other
>> techniques...
>>
>> My 0.0.2 cents (following semantic versioning :) )
>>
>> ~/William
>>
>> On Tue, Jul 7, 2015 at 1:33 PM, John Blum <jb...@pivotal.io> wrote:
>>
>> > Just a quick word on maintaining different (release) branches for main
>> > dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
>> > Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
>> > has to be this way for Apache Geode and Pivotal GemFire given the fork
>> in
>> > the codebase the current disparity between sga2 and develop.
>> >
>> > However, this is not to say that the prior release branches will be
>> > maintained indefinitely.  In fact, they are only maintained back to a
>> > certain release of GemFire (currently, 7.0.2). So it looks a little
>> > something like this...
>> >
>> > SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
>> > SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
>> > SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
>> > SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).
>> >
>> > So, yes, that means I actively maintain 3-4 different versions of SDG,
>> > though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
>> > line.
>> >
>> > See *Spring Data GemFire *project page
>> > <http://projects.spring.io/spring-data-gemfire/> [0] for further
>> details.
>> >
>> > You can also see the *Spring Data GemFire* GitHub project
>> > <https://github.com/spring-projects/spring-data-gemfire/releases> [1]
>> for
>> > release branches and tags as well.
>> >
>> > -j
>> >
>> > [0] - http://projects.spring.io/spring-data-gemfire/
>> > [1] - https://github.com/spring-projects/spring-data-gemfire/releases
>> >
>> >
>> > On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith <dsm...@pivotal.io> wrote:
>> >
>> > > To support different versions of spark, wouldn't it be better to have
>> a
>> > > single code base that has adapters for different versions of spark? It
>> > > seems like that would be better than maintaining several active
>> branches
>> > > with semi-duplicate code.
>> > >
>> > > I do think it would be better to keep the geode spark connector in a
>> > > separate repository with a separate release cycle, for all of the
>> reasons
>> > > outlined on this thread (don't bloat the geode codebase, modularity,
>> > etc.).
>> > > But I think there is also value in keeping it in the apache community
>> and
>> > > managing it through the apache process. I'm not sure how "just put it
>> on
>> > > github" would work out. Maybe it's just a matter of making it through
>> the
>> > > pain of the restrictive incubation process until we can split this
>> code
>> > > out. And in the mean time keeping it as loosely coupled as possible.
>> > >
>> > > -Dan
>> > >
>> > > On Tue, Jul 7, 2015 at 11:57 AM, John Blum <jb...@pivotal.io> wrote:
>> > >
>> > > > +1 - Bingo, that tis the question.
>> > > >
>> > > > Part of the answer lies in having planned, predictable and a
>> consistent
>> > > > cadence of releases.
>> > > >
>> > > > E.g. the *Spring Data* project <
>> http://projects.spring.io/spring-data/
>> > >
>> > > > [0]
>> > > > is an umbrella project managing 12 individual modules (e.g. SD...
>> JPA,
>> > > > Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release
>> > train")
>> > > > which all are at different versions and all have different external,
>> > > > critical (driver) dependencies.  The only dependen(cy|cies) all SD
>> > > modules
>> > > > have in common is the version of the core *Spring Framework* and the
>> > > > version of Spring* Data Commons*.  Otherwise individual modules
>> upgrade
>> > > > their "driver" dependencies at different cycles, possibly in
>> different
>> > > > "release train", but only when the current release train is released
>> > > > (~every 4 weeks).  See SD Wiki
>> > > > <
>> > > >
>> > >
>> >
>> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
>> > > > >
>> > > > [1]
>> > > > for more details.
>> > > >
>> > > > [0] - http://projects.spring.io/spring-data/
>> > > > [1] -
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
>> > > >
>> > > >
>> > > > On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase <gch...@pivotal.io>
>> > > wrote:
>> > > >
>> > > > > More important than easy to develop is easy to pick up and use.
>> > > > >
>> > > > > Improving the new user experience is something that needs
>> attention
>> > > from
>> > > > > Geode.  How we develop and provide Spark integration needs to take
>> > this
>> > > > > into account.
>> > > > >
>> > > > > Once we are able to provide official releases, how can a user know
>> > and
>> > > > make
>> > > > > sure they are getting the correct plug-in version, and have
>> > relatively
>> > > up
>> > > > > to date support for latest Geode and Spark versions?
>> > > > >
>> > > > > That to me is the requirement we should be designing for first in
>> our
>> > > > > development process.
>> > > > >
>> > > > > On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik <
>> > > ro...@shaposhnik.org>
>> > > > > wrote:
>> > > > >
>> > > > > > On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade <
>> > > > aging...@pivotal.io>
>> > > > > > wrote:
>> > > > > > > Agree...And thats the point...The connector code needs to
>> catch
>> > up
>> > > > with
>> > > > > > > spark release train; if its part of Geode then the Geode
>> releases
>> > > > needs
>> > > > > > to
>> > > > > > > happen as often as Spark release (along with other planned
>> Geode
>> > > > > > release)...
>> > > > > >
>> > > > > > I don't think this is a realistic goal to have that many
>> actively
>> > > > > > supported branches
>> > > > > > of Geode Spark connector.
>> > > > > >
>> > > > > > Look, I've been around Hadoop ecosystem for years. Nowhere the
>> > > problem
>> > > > of
>> > > > > > integration with upstream is as present as in Hadoop ecosystem
>> > > > > > (everything depends
>> > > > > > on everything else and everything evolves like crazy). I haven't
>> > > seen a
>> > > > > > single
>> > > > > > project in that ecosystem that would be able to support a
>> blanket
>> > > > > statement
>> > > > > > like the above. May be Geode has resources that guys depending
>> on
>> > > > > something
>> > > > > > like HBase simply don't have.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Roman.
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Greg Chase
>> > > > >
>> > > > > Director of Big Data Communities
>> > > > > http://www.pivotal.io/big-data
>> > > > >
>> > > > > Pivotal Software
>> > > > > http://www.pivotal.io/
>> > > > >
>> > > > > 650-215-0477
>> > > > > @GregChase
>> > > > > Blog: http://geekmarketing.biz/
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -John
>> > > > 503-504-8657
>> > > > john.blum10101 (skype)
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > -John
>> > 503-504-8657
>> > john.blum10101 (skype)
>> >
>>
>>
>>
>> --
>>
>> William Markito Oliveira
>> Enterprise Architect
>> -- For questions about Apache Geode, please write to
>> *dev@geode.incubator.apache.org
>> <dev@geode.incubator.apache.org>*
>>
>
>
>
> --
> -John
> 503-504-8657
> john.blum10101 (skype)
>



-- 
-John
503-504-8657
john.blum10101 (skype)

Re: Where to place "Spark + GemFire" connector.

Reply via email to