On Thu, Apr 1, 2010 at 7:50 AM, Tim Bunce <tim.bu...@pobox.com> wrote:
> On Thu, Apr 01, 2010 at 12:39:27AM -0400, David Nicol wrote:
>> On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen <a...@perl.org> wrote:
>> > The main point here is that we can't use 20 inodes per distribution.
>>
>> so don't. How much reengineering would be needed to keep CPAN in a
>> database instead of a file system?
>
> Random thoughts...

FWIW I've had similar thoughts.  I was discussing them with David
Wheeler in relation to the proposed PgAN (Postgres).


> * If you squint a little you can view git as a database with excellent
> replication support.

For bonus points, its smaller.  A bare gitpan is 5 gigs.  BackPAN is 14.


> * cpanminus already supports installing from a git repo.
>
> * For backwards compatibility a simple perl web server could provide a
> classic CPAN http mirror 'view' over a git repo like gitpan.
> This cpan-git-server would create and serve up cached distro tarballs on 
> demand.
> Someone could whip up one to work over gitpan as a proof of concept.

Its potentially even simpler over gitpan as github will produce
tarballs.  You just need to map the URLs.  I say potentially because
github will produce a tarball named after the commit checksum, not the
tag.  Something I've been on them to fix.


> * The need for widespread mirroring is less significant than it was in
> years past. (Also using git as the inter-mirror transport of source files
> means there'll be much less traffic between mirrors. Effectively only
> the diffs between releases.)

Not being a sysadmin, this is my gut feeling.  Relative to hard drive
prices, CPAN (hell, BackPAN) has shrunk.  I'd imagine the same to be
the case relative to network capacity.


> * New approaches to replication, such as git, don't have to be supported
> by existing mirror providers. A new set of cpan-git-mirror providers could
> emerge.
>
> * Any cpan-git-mirror provider running a cpan-git-server could be
> included in the list of mirrors used by existing installers.
>
> * Over time the number of cpan-git-mirror's and cpan-git-server's could
> grow and the number of traditional CPAN ftp/rsync mirrors could fall.

The central thesis is correct, git provides a very simple, very
compact database that sorts things by version and by distribution.
The downside is CPAN doesn't really do things by distribution, so that
would have to be worked out.  IMO this is a Good Thing that needs to
be done.

See http://use.perl.org/~schwern/journal/40014 for gitpan's issues
with identifying distributions.

Reply via email to