Re: Where to place "Spark + GemFire" connector.

Bruce Schuchardt Tue, 07 Jul 2015 16:10:25 -0700

+1

Le 7/7/2015 3:58 PM, John Blum a écrit :

There are a few Spring projects that are exemplary (examples) in their
modularity, contained within a single repo.  The core Spring Framework and
Spring Boot are 2 such projects that immediately come to mind.


However, this sort of disciplined modularity requires a very important
delineation of responsibilities / separation of concerns reflected in the
organization (and cleanliness) of the codebase combined with a very
well-understood set of principles and practices to ensure this level of
modularity is maintained; Geode is none of these things at the moment.  I
echo Kirk's early concerns about build times and testing, etc, not to
mention the gravity surrounding what is pertinent and what is not in order
to contribute to the "core" of Geode.

-j

On Tue, Jul 7, 2015 at 2:55 PM, William Markito <wmark...@pivotal.io> wrote:

Folks,

There is a lot of good and valuable points on this thread, however we need
to discuss some practical actions here and maybe even see what other
projects have already done during their incubation.

For example, Apache Zeppelin (incubating) is also dependent on Spark and
what they do is select which version of Spark you're going to build
against.

https://github.com/apache/incubator-zeppelin/tree/master

Given that we don't yet have a single release, I'm not sure we should
already be that concerned about having and maintaining multiple sub
repositories or even sub projects.

That said, it doesn't mean we shouldn't be concerned about modularization,
it's just that we don't actually need yet to have releases of each
independent modules trying to catch-up with other projects release cycle
without having a release cycle for our project yet.

IHMO, whenever a Geode release happens we may need to decide if it's going
to support an specific Spark version or be built and support multiple
versions (like Zeppelin) -  That also may apply to HDFS since we also
support HDFS but we don't have HDFS integration as a separate repository
just in order to catch up with HDFS release cycle. It doesn't mean it
shouldn't be modularized and developed as a separate project under Geode.

IOW, I'd vote to keep everything in the same repository, as different
projects, with modularized code and dependencies and when time comes, after
a couple releases, if it makes sense, sure, break it in different
repositories or sub-projects that may grow by themselves.

To be honest, I'm not exactly sure why the modularization discussion has to
be so tied with repositories.  Spring or any other DI framework for example
allows you to write nice and decoupled code... not mention other
techniques...

My 0.0.2 cents (following semantic versioning :) )

~/William

On Tue, Jul 7, 2015 at 1:33 PM, John Blum <jb...@pivotal.io> wrote:

Just a quick word on maintaining different (release) branches for main
dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
has to be this way for Apache Geode and Pivotal GemFire given the fork in
the codebase the current disparity between sga2 and develop.

However, this is not to say that the prior release branches will be
maintained indefinitely.  In fact, they are only maintained back to a
certain release of GemFire (currently, 7.0.2). So it looks a little
something like this...

SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).

So, yes, that means I actively maintain 3-4 different versions of SDG,
though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
line.

See *Spring Data GemFire *project page
<http://projects.spring.io/spring-data-gemfire/> [0] for further

details.

You can also see the *Spring Data GemFire* GitHub project
<https://github.com/spring-projects/spring-data-gemfire/releases> [1]

for

release branches and tags as well.

-j

[0] - http://projects.spring.io/spring-data-gemfire/
[1] - https://github.com/spring-projects/spring-data-gemfire/releases


On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith <dsm...@pivotal.io> wrote:

To support different versions of spark, wouldn't it be better to have a
single code base that has adapters for different versions of spark? It
seems like that would be better than maintaining several active

branches

with semi-duplicate code.

I do think it would be better to keep the geode spark connector in a
separate repository with a separate release cycle, for all of the

reasons

outlined on this thread (don't bloat the geode codebase, modularity,

etc.).

But I think there is also value in keeping it in the apache community

and

managing it through the apache process. I'm not sure how "just put it

on

github" would work out. Maybe it's just a matter of making it through

the

pain of the restrictive incubation process until we can split this code
out. And in the mean time keeping it as loosely coupled as possible.

-Dan

On Tue, Jul 7, 2015 at 11:57 AM, John Blum <jb...@pivotal.io> wrote:

+1 - Bingo, that tis the question.

Part of the answer lies in having planned, predictable and a

consistent

cadence of releases.

E.g. the *Spring Data* project <

http://projects.spring.io/spring-data/

[0]
is an umbrella project managing 12 individual modules (e.g. SD...

JPA,

Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release

train")

which all are at different versions and all have different external,
critical (driver) dependencies.  The only dependen(cy|cies) all SD

modules

have in common is the version of the core *Spring Framework* and the
version of Spring* Data Commons*.  Otherwise individual modules

upgrade

their "driver" dependencies at different cycles, possibly in

different

"release train", but only when the current release train is released
(~every 4 weeks).  See SD Wiki
<

https://github.com/spring-projects/spring-data-commons/wiki/Release-planning

[1]
for more details.

[0] - http://projects.spring.io/spring-data/
[1] -

https://github.com/spring-projects/spring-data-commons/wiki/Release-planning


On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase <gch...@pivotal.io>

wrote:

More important than easy to develop is easy to pick up and use.

Improving the new user experience is something that needs attention

from

Geode.  How we develop and provide Spark integration needs to take

this

into account.

Once we are able to provide official releases, how can a user know

and

make

sure they are getting the correct plug-in version, and have

relatively

up

to date support for latest Geode and Spark versions?

That to me is the requirement we should be designing for first in

our

development process.

On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik <

ro...@shaposhnik.org>

wrote:

On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade <

aging...@pivotal.io>

wrote:

Agree...And thats the point...The connector code needs to catch

up

with

spark release train; if its part of Geode then the Geode

releases

needs

to

happen as often as Spark release (along with other planned

Geode

release)...

I don't think this is a realistic goal to have that many actively
supported branches
of Geode Spark connector.

Look, I've been around Hadoop ecosystem for years. Nowhere the

problem

of

integration with upstream is as present as in Hadoop ecosystem
(everything depends
on everything else and everything evolves like crazy). I haven't

seen a

single
project in that ecosystem that would be able to support a blanket

statement

like the above. May be Geode has resources that guys depending on

something

like HBase simply don't have.

Thanks,
Roman.



--
Greg Chase

Director of Big Data Communities
http://www.pivotal.io/big-data

Pivotal Software
http://www.pivotal.io/

650-215-0477
@GregChase
Blog: http://geekmarketing.biz/



--
-John
503-504-8657
john.blum10101 (skype)



--
-John
503-504-8657
john.blum10101 (skype)



--

William Markito Oliveira
Enterprise Architect
-- For questions about Apache Geode, please write to
*dev@geode.incubator.apache.org
<dev@geode.incubator.apache.org>*

Re: Where to place "Spark + GemFire" connector.

Reply via email to