> On Jan 14, 2014, at 8:37 PM, Benoit Chesneau <bchesn...@gmail.com> wrote:
> 
> On Wed, Jan 15, 2014 at 12:22 AM, Paul Davis 
> <paul.joseph.da...@gmail.com>wrote:
> 
>> I've recently been having discussions about how to handle the
>> repository configuration for various bits of CouchDB post-merge. The
>> work that Benoit has been doing on the rcouch merge branch have also
>> touched on this topic as well.
>> 
>> The background for those unfamiliar is that the standard operating
>> procedure for Erlang is to have a single Erlang application per
>> repository and then rely on rebar to fetch each dependency.
>> Traditionally in CouchDB land we've always just included the source to
>> all applications in a single monolithic repository and periodically
>> reimport changes from upstream dependencies.
>> 
>> Recently rcouch changed from the monolithic repository to use external
>> repositories for some dependencies. Originally the BigCouch used an
>> even more federated scheme that had each Erlang application in an
>> external repository (and the core couch Erlang application was in the
>> root repository). When Bob Newson and I did the initial hacking on the
>> BigCouch merge we pulled those external dependencies into the root
>> repository reverting back to the large monolithic approach.
>> 
>> After trying to deal with the merge and contemplating how various
>> Erlang release things might work it's become fairly apparent that the
>> monolithic approach is a bit constrictive. For instance, part of
>> rebar's versioning abilities lets you tag repositories to generate
>> versions rather than manually updating versions in source files.
>> Another thing I've found on other projects is that having each
>> application in a separate repository requires developers to think a
>> bit more detailed about the public internal interfaces used through
>> out the system. We've done some work to this extent already with
>> separating source directories but forcing commits to multiple
>> repositories shoots up a big red flag that maybe there's a high level
>> of coupling between two bits of code.
>> 
>> Other benefits of having the multiple repository setup is that its
>> possible that this lends itself to being integrated with the proposed
>> plugin system. It'd be fairly trivial to have a script that went and
>> fetched plugins that aren't developed at Apache (as a ./configure time
>> switch type of thing). Having a system like this would also allow us
>> to have groups focused on particular bits of development not have to
>> concern themselves with the unrelated parts of the system.
>> 
>> Given all that, I'd like to propose that we move to having a
>> repository for each application/dependency that we use to build
>> CouchDB. Each repository would be hosted on ASF infra and mirrored to
>> GitHub as expected. This means that we could have the root repository
>> be a simple repo that contains packaging/release/build stuff that
>> would enable lots of the ideas offered on configurable types of
>> release generation. I've included an initial list of repositories at
>> the end of this email. Its basically just the apps that have been
>> split out in either rcouch or bigcouch plus a few other bits from
>> CouchDB master.
>> 
>> I would also point out that even though our main repo would need to
>> fetch other dependencies from the internet to build the final output,
>> we fully intend that our release tarballs would *not* have this
>> requirement. Ie, when we go to cut a release part of the process the
>> RM would run would be to pull all of those dependencies before
>> creating a tarball that would be wholly self contained. Given an
>> apache-couchdb-x.y.z.tar.gz release file, there won't be a requirement
>> to have access to the ASF git repos.
>> 
>> I'm not entirely sure how controversial this is for anyone. For the
>> most part the reactions I remember hearing were more concerned on
>> whether the infrastructure team would allow us to use this sort of
>> configuration. I looked yesterday and asked and apparently its
>> something we can request but as always we'll want to verify again if
>> we have consensus to move in this direction.
>> 
>> Anyone have comments or flames? Right now I'm just interested in
>> feeling out what sort of (lack of?) consensus there is on such a
>> change. If there's general consensus I'd think we'd do a vote in a
>> couple weeks and if that passes then start on down this road for the
>> two merge projects and then it would become part of master once those
>> land (as opposed to doing this to master and then attempting to merge
>> rcouch/bigcouch onto that somehow).
>> 
>> 
>> This is a quick pass at listing what extra repositories I'd have
>> created. Some of these applications only exist in the bigcouch and/or
>> rcouch branches so that's where the unfamiliar application names are
>> from. I'd also point out that the documentation and fauxton things are
>> just on a whim in that we could decouple that development from the
>> erlang development. I can see arguments for an against those. I'm much
>> less concerned on that aspect than the Erlang parts that are directly
>> affected by rebar/Erlang conventions.
>> 
>>    chttpd
>>    config
>>    couch
>>    couch_collate
>>    couch_dbupdates
>>    couch_httpd
>>    couch_index
>>    couch_mrview
>>    couch_plugins
>>    couch_replicator
>>    documentation
>>    ddoc_cache
>>    ets_lru
>>    fabric
>>    fauxton
>>    ibrowse
>>    jiffy
>>    mem3
>>    mochiweb
>>    oauth
>>    rebar
>>    rexi
>>    snappy
>>    twig
> 
> 
> I also contemplated this and and I am generally +1 on this. And definitely
> +1 to mirror them on the apache git if possible.  I have a couple of
> comments though.
> 
> Initially I also had everything separated in its own source repository. 1
> year ago I merged back as one core repo the couchdb erlang applications and
> put all the dependencies in the refuge repository or in the refuge CDN for
> the spidermonkey and ICU sources.
> 
> I merged back as one core repo the couchdb erlang applications because they
> were a little too much dependant. Especially couch_httpd, couch_index and
> couch_mrview. These applications are not yet enough by themselves.
> 
> Imo if we split everything in  their own apps, then we should make sure
> that couch_httpd can be used without couch_index and couch_mrview (which
> means that "all_docs" is available in couch_httpd). Also we should be able
> to just launch couch without any of the above. And probably without the
> need of an ini. The couch_query_server module thing is an interesting case.
> bigcouch is also introducing `ddoc_cache` which I am not sure why it is
> provided as a standalone app. Does it means it can be replaced by another
> application eventually? Why not having it simply in the  couch application?
> Does it needs to be updated separately?
> 
> Also  all our base applications should also be named spaced correctly so
> they will be strictly identified as erlang modules:  "config" is too
> generic, "ddoc_cache" too. Others are probably OK.
> 
> There are probably other things that we could provide as apps:
> 
> - couch_daemon,
> - couch_js
> - couch_external
> - couch_stats
> - couch_compaction_daemon
> - couch_httpd_proxy
> 
> Anyway again i'm +1 for this move, I really think it's a good idea.
> 
> - benoit

I agree on most of this. Roughly I see three general points.

First, deciding on whether some things are external deps is definitely up for 
discussion. Whether couch_mrview is a different app/repo is not necessarily 
clear cut. Personally I think I over engineered couch_index which blurs the 
lines a bit. If I could wave a wand I'd have just couch_mrview and it'd be 
separate. More importantly I think the separate repos makes these things more 
apparent. The fact were discussing this sort of architecture thing is 
suggestive that it's forcing us to think a bit harder.

Second is the aspect of composability. For instance the mrview thing to me is 
obviously a different repo precisely so a user could import couch (_core?) 
directly without requiring the spider monkey dependency. The monolithic repo 
doesn't allow this without some very non-standard tooling. 

Thirdly, app naming is always a contention. The config name was actually a hot 
code upgrade concern. We couldn't reuse couch_config directly at the time. And 
Adam was also hopeful we could the it into a useful non-specific config app. 

Fourthly, and related to secondly, we'll also want to look at splitting other 
apps out as necessary. The ones you listed I think aren't controversial it's 
just that no one has done it yet. My list was purely what existed so far 
without attempting to carve things up more. I definitely agree we should carve 
more in just wanted to cover consensus that carving is the right direction.

Fifthly, I'm done typing on my phone. I'll fill in more thoughts tomorrow. 

Reply via email to