+1 go for it Paul.
I was wondering if the namespace is important -- e.g. will you rename
jiffy to couch_jiffy (or whatever) so that it can be hosted at
git://git.apache.org/couchdb-jiffy.git ?
or would it be better to put them at e.g.
git://git.couchdb.apache.org/jiffy.git or any other more appropriate
url? We've got a lot of these, rebar, ets_lru and more.
On 16 January 2014 20:42, Paul Davis <[email protected]> wrote:
> It doesn't appear that this is objectionable to anyone. Does anyone
> have an objection to us having infra/me create these repos to use for
> the bigcouch/rcouch merge work? This won't affect master or releases
> until those merges finish.
>
> On Tue, Jan 14, 2014 at 11:02 PM, Paul J Davis
> <[email protected]> wrote:
>>
>>
>>> On Jan 14, 2014, at 8:37 PM, Benoit Chesneau <[email protected]> wrote:
>>>
>>> On Wed, Jan 15, 2014 at 12:22 AM, Paul Davis
>>> <[email protected]>wrote:
>>>
>>>> I've recently been having discussions about how to handle the
>>>> repository configuration for various bits of CouchDB post-merge. The
>>>> work that Benoit has been doing on the rcouch merge branch have also
>>>> touched on this topic as well.
>>>>
>>>> The background for those unfamiliar is that the standard operating
>>>> procedure for Erlang is to have a single Erlang application per
>>>> repository and then rely on rebar to fetch each dependency.
>>>> Traditionally in CouchDB land we've always just included the source to
>>>> all applications in a single monolithic repository and periodically
>>>> reimport changes from upstream dependencies.
>>>>
>>>> Recently rcouch changed from the monolithic repository to use external
>>>> repositories for some dependencies. Originally the BigCouch used an
>>>> even more federated scheme that had each Erlang application in an
>>>> external repository (and the core couch Erlang application was in the
>>>> root repository). When Bob Newson and I did the initial hacking on the
>>>> BigCouch merge we pulled those external dependencies into the root
>>>> repository reverting back to the large monolithic approach.
>>>>
>>>> After trying to deal with the merge and contemplating how various
>>>> Erlang release things might work it's become fairly apparent that the
>>>> monolithic approach is a bit constrictive. For instance, part of
>>>> rebar's versioning abilities lets you tag repositories to generate
>>>> versions rather than manually updating versions in source files.
>>>> Another thing I've found on other projects is that having each
>>>> application in a separate repository requires developers to think a
>>>> bit more detailed about the public internal interfaces used through
>>>> out the system. We've done some work to this extent already with
>>>> separating source directories but forcing commits to multiple
>>>> repositories shoots up a big red flag that maybe there's a high level
>>>> of coupling between two bits of code.
>>>>
>>>> Other benefits of having the multiple repository setup is that its
>>>> possible that this lends itself to being integrated with the proposed
>>>> plugin system. It'd be fairly trivial to have a script that went and
>>>> fetched plugins that aren't developed at Apache (as a ./configure time
>>>> switch type of thing). Having a system like this would also allow us
>>>> to have groups focused on particular bits of development not have to
>>>> concern themselves with the unrelated parts of the system.
>>>>
>>>> Given all that, I'd like to propose that we move to having a
>>>> repository for each application/dependency that we use to build
>>>> CouchDB. Each repository would be hosted on ASF infra and mirrored to
>>>> GitHub as expected. This means that we could have the root repository
>>>> be a simple repo that contains packaging/release/build stuff that
>>>> would enable lots of the ideas offered on configurable types of
>>>> release generation. I've included an initial list of repositories at
>>>> the end of this email. Its basically just the apps that have been
>>>> split out in either rcouch or bigcouch plus a few other bits from
>>>> CouchDB master.
>>>>
>>>> I would also point out that even though our main repo would need to
>>>> fetch other dependencies from the internet to build the final output,
>>>> we fully intend that our release tarballs would *not* have this
>>>> requirement. Ie, when we go to cut a release part of the process the
>>>> RM would run would be to pull all of those dependencies before
>>>> creating a tarball that would be wholly self contained. Given an
>>>> apache-couchdb-x.y.z.tar.gz release file, there won't be a requirement
>>>> to have access to the ASF git repos.
>>>>
>>>> I'm not entirely sure how controversial this is for anyone. For the
>>>> most part the reactions I remember hearing were more concerned on
>>>> whether the infrastructure team would allow us to use this sort of
>>>> configuration. I looked yesterday and asked and apparently its
>>>> something we can request but as always we'll want to verify again if
>>>> we have consensus to move in this direction.
>>>>
>>>> Anyone have comments or flames? Right now I'm just interested in
>>>> feeling out what sort of (lack of?) consensus there is on such a
>>>> change. If there's general consensus I'd think we'd do a vote in a
>>>> couple weeks and if that passes then start on down this road for the
>>>> two merge projects and then it would become part of master once those
>>>> land (as opposed to doing this to master and then attempting to merge
>>>> rcouch/bigcouch onto that somehow).
>>>>
>>>>
>>>> This is a quick pass at listing what extra repositories I'd have
>>>> created. Some of these applications only exist in the bigcouch and/or
>>>> rcouch branches so that's where the unfamiliar application names are
>>>> from. I'd also point out that the documentation and fauxton things are
>>>> just on a whim in that we could decouple that development from the
>>>> erlang development. I can see arguments for an against those. I'm much
>>>> less concerned on that aspect than the Erlang parts that are directly
>>>> affected by rebar/Erlang conventions.
>>>>
>>>> chttpd
>>>> config
>>>> couch
>>>> couch_collate
>>>> couch_dbupdates
>>>> couch_httpd
>>>> couch_index
>>>> couch_mrview
>>>> couch_plugins
>>>> couch_replicator
>>>> documentation
>>>> ddoc_cache
>>>> ets_lru
>>>> fabric
>>>> fauxton
>>>> ibrowse
>>>> jiffy
>>>> mem3
>>>> mochiweb
>>>> oauth
>>>> rebar
>>>> rexi
>>>> snappy
>>>> twig
>>>
>>>
>>> I also contemplated this and and I am generally +1 on this. And definitely
>>> +1 to mirror them on the apache git if possible. I have a couple of
>>> comments though.
>>>
>>> Initially I also had everything separated in its own source repository. 1
>>> year ago I merged back as one core repo the couchdb erlang applications and
>>> put all the dependencies in the refuge repository or in the refuge CDN for
>>> the spidermonkey and ICU sources.
>>>
>>> I merged back as one core repo the couchdb erlang applications because they
>>> were a little too much dependant. Especially couch_httpd, couch_index and
>>> couch_mrview. These applications are not yet enough by themselves.
>>>
>>> Imo if we split everything in their own apps, then we should make sure
>>> that couch_httpd can be used without couch_index and couch_mrview (which
>>> means that "all_docs" is available in couch_httpd). Also we should be able
>>> to just launch couch without any of the above. And probably without the
>>> need of an ini. The couch_query_server module thing is an interesting case.
>>> bigcouch is also introducing `ddoc_cache` which I am not sure why it is
>>> provided as a standalone app. Does it means it can be replaced by another
>>> application eventually? Why not having it simply in the couch application?
>>> Does it needs to be updated separately?
>>>
>>> Also all our base applications should also be named spaced correctly so
>>> they will be strictly identified as erlang modules: "config" is too
>>> generic, "ddoc_cache" too. Others are probably OK.
>>>
>>> There are probably other things that we could provide as apps:
>>>
>>> - couch_daemon,
>>> - couch_js
>>> - couch_external
>>> - couch_stats
>>> - couch_compaction_daemon
>>> - couch_httpd_proxy
>>>
>>> Anyway again i'm +1 for this move, I really think it's a good idea.
>>>
>>> - benoit
>>
>> I agree on most of this. Roughly I see three general points.
>>
>> First, deciding on whether some things are external deps is definitely up
>> for discussion. Whether couch_mrview is a different app/repo is not
>> necessarily clear cut. Personally I think I over engineered couch_index
>> which blurs the lines a bit. If I could wave a wand I'd have just
>> couch_mrview and it'd be separate. More importantly I think the separate
>> repos makes these things more apparent. The fact were discussing this sort
>> of architecture thing is suggestive that it's forcing us to think a bit
>> harder.
>>
>> Second is the aspect of composability. For instance the mrview thing to me
>> is obviously a different repo precisely so a user could import couch
>> (_core?) directly without requiring the spider monkey dependency. The
>> monolithic repo doesn't allow this without some very non-standard tooling.
>>
>> Thirdly, app naming is always a contention. The config name was actually a
>> hot code upgrade concern. We couldn't reuse couch_config directly at the
>> time. And Adam was also hopeful we could the it into a useful non-specific
>> config app.
>>
>> Fourthly, and related to secondly, we'll also want to look at splitting
>> other apps out as necessary. The ones you listed I think aren't
>> controversial it's just that no one has done it yet. My list was purely what
>> existed so far without attempting to carve things up more. I definitely
>> agree we should carve more in just wanted to cover consensus that carving is
>> the right direction.
>>
>> Fifthly, I'm done typing on my phone. I'll fill in more thoughts tomorrow.
>>