It hasn't been done because its outside of the scope of design for rsync.
It's meant to sync arbitrary filesets in which many, if not all, changes are
made out of band. It's decidely non-trivial to implement in that mode
unless you're willing to accept a certain window in which your database
On Apr 2, 2010, at 1:50, Arthur Corliss wrote:
And my assertion has been that the excessive stats by the server are a bigger
impediment to synchronization than the inode count.
Well, then one of us don't understand how file systems etc work. :-)
- ask
On Apr 1, 2010, at 19:49, Arthur Corliss wrote:
I can't believe I'm doing this, but ...
The main point here is that we can't use 20 inodes per distribution. It's
Just Nuts. Sure, it's only something like 400k files/inodes now - but at
the rate it's going it'll be a lot more soon enough.
Much of this discussion is beyond my depth but in terms of keeping it
simple, and trying to limit the stat calls on the upstream servers,
what about DNS as a replication model? You could break up the tree at
logical divisions similar to zones and assign them serial numbers
(say a .serial file)
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:
On Apr 2, 2010, at 1:50, Arthur Corliss wrote:
And my assertion has been that the excessive stats by the server are a bigger
impediment to synchronization than the inode count.
Well, then one of us don't understand how file systems etc work. :-)
On Wed, 31 Mar 2010, Ask Bj?rn Hansen wrote:
snip
Everyone who doesn't run mirrors says oh, who cares - it doesn't bother me.
Some of us who does run mirrors say actually, that sort of thing is important and
an actual issue..
Others reply then you're doing it wrong. But nobody came with
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:
I can't believe I'm doing this, but ...
:-) All for entertainment's sake...
The main point here is that we can't use 20 inodes per distribution. It's Just
Nuts. Sure, it's only something like 400k files/inodes now - but at the rate
it's going
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:
Talk = ZzZz.
Code = Interesting.
Deployment = Useful.
Please. The talk serves to gauge interest before I waste any time
implementing a solution that's already been rejected out of hand. As I've
mentioned repeatedly I already use rsync, albeit on
On Tue, Mar 30, 2010 at 10:08:57PM +0200, Rene Schickbauer wrote:
Now, if we where to put all files into mercurial, git or the like,
renaming the files so they don't have version numbers in their names but
storing them sequentially as commits so new versions update old ones.
Sort of like
Nicholas Clark wrote:
On Tue, Mar 30, 2010 at 10:08:57PM +0200, Rene Schickbauer wrote:
Now, if we where to put all files into mercurial, git or the like,
renaming the files so they don't have version numbers in their names but
storing them sequentially as commits so new versions update old
David Nicol wrote:
On Sun, Mar 28, 2010 at 2:32 PM, Elaine Ashton eash...@mac.com wrote:
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
Has some sort of disk quota system for CPAN author accounts ever been
considered?
Not specifically, no, at least not that I'm aware of. That would have
--Original Message--
From: Arthur Corliss
To: Dana Hudes
Cc: module-authors@perl.org
Sent: Mar 29, 2010 1:12 PM
Subject: Re: Trimming the CPAN - Automatic Purging
On Mon, 29 Mar 2010, Dana Hudes wrote:
Orcallator, procallator and friends aren't shiny new toys
Adrian Cockroft wrote initial
I've said nothing till now, because I figured more noise wouldn't help much.
But I quite like the rsync daemon/proxy idea, and as it so happens I'm
attending the OzLabs Unconference in 3 weeks time to hang out with
Tridge, Rusty and the other Australia C/Kernel/Samba/RSync elites.
So I'd be
On Wed, Mar 31, 2010 at 01:03:51PM +1100, Adam Kennedy wrote:
I've said nothing till now, because I figured more noise wouldn't help much.
But I quite like the rsync daemon/proxy idea, and as it so happens I'm
attending the OzLabs Unconference in 3 weeks time to hang out with
Tridge, Rusty
On Wed, Mar 31, 2010 at 10:45 AM, David Landgren da...@landgren.net wrote:
On 31/03/2010 06:52, David Nicol wrote:
new proposal: Make modules pay rent in order to remain on a mirror.
Rent could be in the form of actual user interest, or good reviews.
Use as a dependency could count as rent.
On Sun, Mar 28, 2010 at 07:28:48AM -0700, dhu...@hudes.org wrote:
The danger in a CPAN::Mini and in removing old versions is that one is
assuming that the latest and greatest is the one to use. This is false.
And this is why I run cp5.6.2an.barnyard.co.uk etc.
It wouldn't be difficult for
On Sun, Mar 28, 2010 at 06:04:03PM -0400, David Golden wrote:
As always with perl, it depends. They are laid out just as a normal
CPAN repository, so if you have one in your urllist, something
specified as author/distribution.tar.gz might well resolve.
Not just might well resolve. It *will*
On Tue, 30 Mar 2010, Matija Grabnar wrote:
Er, not exactly. Read
http://www.cvsup.org/howsofast.html
I had read http://www.cvsup.org/faq.html#features item #3.
From what I can see, cvsup uses the rsync algorithm on a file-by-file basis
(it uses just the differential send part of the rsync
On Tue, 30 Mar 2010, Rene Schickbauer wrote:
snip
This could work like any modern, distributed version control systems. That
way, the user would also be able to apply local patches and/or deciding which
changesets to pull in from the main server. Or have a complete, local mirror
and one for
On Sun, Mar 28, 2010 at 2:32 PM, Elaine Ashton eash...@mac.com wrote:
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
Has some sort of disk quota system for CPAN author accounts ever been
considered?
Not specifically, no, at least not that I'm aware of. That would have to be
On Sun, 28 Mar 2010, dhu...@hudes.org wrote:
The entire point of rsync is to send only changes.
Therefore once your mirror initially syncs the old versions of modules is
not the issue. Indeed, removing the old versions would present additional
burden on synchronization! The ongoing burden is
On Sun, 28 Mar 2010, Nicholas Clark wrote:
Are you running a large public mirror site, where you don't even have
knowledge of who is mirroring from you?
(Not even knowledge, let alone channels of communication with, let alone
control over)
Because (as I see it, not having done any of this)
On Sun, 28 Mar 2010, Elaine Ashton wrote:
I do very much like Tim's proposal for giving old modules a push to BackPAN
since, with proper communication of the changes to the authors along with a way
to mark exceptions, this would rid CPAN of a lot of cruft that should be on
BackPan anyway.
On Sun, 28 Mar 2010, Andreas J. Koenig wrote:
Says the author of a module named Paranoid. A lovely coincidence.
:-) As they say, just because you may be paranoid, it doesn't mean that no
one's out to get you.
If you want to study the CPAN checkpointed logs solution running on
the very CPAN
On Sun, 28 Mar 2010, Dana Hudes wrote:
Use of wget and http to download an entire site means numerous TCP opens and
HTTP GET requests. The entire point of rsync is that it knows there are
numerous downloads. It does ONE open. This allows TCP slow start to ramp up
That wasn't exactly what I
www.orcaware.org i think it was
Sent from my BlackBerry® smartphone with Nextel Direct Connect
-Original Message-
From: Arthur Corliss acorl...@nevaeh-linux.org
Date: Mon, 29 Mar 2010 00:31:50
To: Dana Hudesdhu...@hudes.org
Cc: module-authors@perl.org
Subject: Re: Trimming the CPAN
Hi Elaine,
Elaine Ashton wrote:
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
Jarkko and I were talking about it this morning - as he's not in
favour of pruning - while trying to think of a way around the size
problem and he reminded me of the idea that, if I recall correctly
was Adreas'
On Sun, 28 Mar 2010, Dana Hudes wrote:
Why is rsync a problem? Where is the bottleneck in the protocol or the code
implementing it?
Specifics!
SAR is antiquated doesn't give the info you really need. Using a linux system?
Use procallator and feed resulting collected data to ORCA. Better yet,
I think that Andreas's concept of treating these mirrors as a database is good.
Checkpoint logical log replay is better than a simple rsync for large numbers
of files.
The replication problem for databases is well-understood and open-source code
for it is available from at least Postgresql.
# from Andreas J. Koenig
# on Saturday 27 March 2010 21:02:
If you want to study the CPAN checkpointed logs solution running on
the very CPAN for exactly one year now: File::Rsync::Mirror::Recent
What needs to be done is really extremely trivial: rewrite it in C and
convince the rsync people to
On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote:
:-) You'll have to pardon my indiscriminate epithets. The barbs are coming
from multiple directions. My point still stands, however. Your experience,
however worthy, has zero bearing on whether or not my experience is
just as worthy.
The entire point of rsync is to send only changes.
Therefore once your mirror initially syncs the old versions of modules is
not the issue. Indeed, removing the old versions would present additional
burden on synchronization! The ongoing burden is the ever-growing CPAN.
The danger in a CPAN::Mini
On Sat, Mar 27, 2010 at 08:52:22PM -0800, Arthur Corliss wrote:
On Sat, 27 Mar 2010, Elaine Ashton wrote:
Actually, I thought I was merely offering my opinion both as the sysadmin
for the canonical CPAN mothership and as an end-user. If that makes me a
prick, well, I suppose I should go
the author/package name
manually I think).
Of course, I've never done this myself, so I could be mistaken
--Original Message--
From: Shlomi Fish
To: module-authors@perl.org
Cc: dhu...@hudes.org
Sent: Mar 28, 2010 11:31 AM
Subject: Re: Trimming the CPAN - Automatic Purging
On Sunday 28
Why is rsync a problem? Where is the bottleneck in the protocol or the code
implementing it?
Specifics!
SAR is antiquated doesn't give the info you really need. Using a linux system?
Use procallator and feed resulting collected data to ORCA. Better yet, use
DTrace or at least truss. Compile
* Graham Barr gb...@pobox.com [2010-03-26 10:20]:
On Mar 25, 2010, at 8:42 AM, Barbie wrote:
Lastly I would also personnally be annoyed if only the latest
versions were available, as I often make great use of the diff
tool on search.cpan.org. Having only the latest version
renders that great
* Nicholas Clark n...@ccl4.org [2010-03-28 18:20]:
I'm missing something here, I suspect.
Yes, you are.
How can HTTP be more efficient than rsync? The only obvious
method to me of mirroring a CPAN site by HTTP is to instruct
a client (such as wget) to get it all.
As Arthur has repeatedly
* Dana Hudes dhu...@hudes.org [2010-03-29 04:30]:
Using http for this is inefficient It makes for slower file
transfer because you keep rerunning path mtu probes and tcp
slow start It makes extra socket handles opening and closing
Errm, you missed the last decade. (HTTP/1.1 has keep-alive and
On 26 Mar 2010, at 23:32, Arthur Corliss wrote:
But it's the weakest and simplest link to replace.
Quite a bit of the discussion here on this topic has revolved around an
explanation of why that isn't the case. Setting up rsync is trivial for mirror
operators. Any alternative would likely be
On 27 Mar 2010, at 00:59, Elaine Ashton wrote:
The only snag I can forsee in trimming back on the abundance of modules is
the case where some modules have version requirements for other modules where
it will barf with a mismatch/newer version of the required module (I bumped
into this
On Fri, 26 Mar 2010, Arthur Corliss wrote:
But what the hell do I know. I don't run a *CPAN* mirror, so I must be
freaking clueless...
It's not about what you know, but about what you are willing to
do yourself.
At some point you have to accept that the people who *do* the work
decide *how*
On Friday-201003-26 13:20, Arthur Corliss wrote:
On Fri, 26 Mar 2010, Andy Lester wrote:
Absolutely. This factual info would ideally look like this:
Of the 17,000 distros on CPAN, there are 8,000 that have versions more than a year
older than the most recent one. If those distros with
On Friday-201003-26 19:02, Arthur Corliss wrote:
On Fri, 26 Mar 2010, Jarkko Hietaniemi wrote:
The total size is not the problem. The number of files is. Vanilla
rsync is horribly inefficient (not the protocol, which is genius, mind)
because a client coming by and asking for updates
On Fri, 26 Mar 2010, Elaine Ashton wrote:
Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2
years which is the canonical mirror for a large number of mirrors and the
perspective of having a few terabytes spinning in storage changes quite
dramatically when you are
On Sat, Mar 27, 2010 at 10:52:05AM -0800, Arthur Corliss wrote:
I think I was quite explicit in saying that efficiencies should be pursued
in multiple areas, but the predominant bitch I took away from your thread
dealt with the burden of synchronizing mirrors. What's the easiest way to
On Sat, 27 Mar 2010, Nicholas Clark wrote:
I
You?
Or someone else?
I am quite happy to agree that your understanding and experience of storage
management is better than mine. But that's not the key question, in a
volunteer organisation. The questions I ask, repeating Jan's comments in
On Mar 26, 2010, at 16:02, Arthur Corliss wrote:
Why use rsync, then? Why not have checkpointed logs on cpan with
additions/removals logged by date so you can roll forward on the client,
processing only those files? It would be trivial to set up and a lot more
efficient.
I find it
On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote:
The time-honored tradition of many open source communities is to talk. And
talk. And talk. The problem is that this solves nothing. To do, does.
You are free to decide to take this as a personal insult.
I didn't take it as an insult, I took
Oh, I understand that fully. And I'd be happy to lend some of my
time. But
you don't make people inclined to help when people are lobbing snarky
comments like we'll wait breathlessly for you to do it.
The time-honored tradition of many open source communities is to talk.
And talk. And
On Mar 27, 2010, at 2:52 PM, Arthur Corliss wrote:
Don't be such an arrogant prick. You guys made baseless assumptions about
people's experience with storage management in an attempt to diregard their
opinions. That's being a dick by any metric.
Actually, I thought I was merely offering
On Sat, 27 Mar 2010, Elaine Ashton wrote:
Actually, I thought I was merely offering my opinion both as the sysadmin for
the canonical CPAN mothership and as an end-user. If that makes me a prick,
well, I suppose I should go out and buy one :)
:-) You'll have to pardon my indiscriminate
On Mar 25, 2010, at 8:42 AM, Barbie wrote:
Lastly I would also personnally be annoyed if only the latest versions
were available, as I often make great use of the diff tool on
search.cpan.org. Having only the latest version renders that great tool
redundant :(
I use that too :-) and it is
Currently on PAUSE you have to explicitly delete old uploads.
How about changing it so you have to explicitly KEEP old uploads
that appear to have been superseded?
PAUSE already has a mechanism to delete files at some future point in
time. That's currently only used as part of a safety/sanity
On Mar 25, 2010, at 4:12, Tim Bunce wrote:
Currently on PAUSE you have to explicitly delete old uploads.
How about changing it so you have to explicitly KEEP old uploads
that appear to have been superseded?
I like it.
I agree with Jarkko that there should be a way to pin some versions and
On Mar 25, 2010, at 8:38, Andy Armstrong wrote:
I like that solution better
[snip]
But solution to what? Are we convinced there's actually a problem here?
CPAN has almost 200k files. www.cpan.org says there are 17627 modules.
rsyncing a gazillion files doesn't work that well (on the
What Jarkko said.
On Mar 25, 2010, at 08:00, Jarkko Hietaniemi wrote:
I have one case where the v1 and v2 of a module are simply
incompatible, but v1 still works, and unless the users have a
compelling reason, they won't migrate. Pulling the rug from under
them would be quite
On Fri, 26 Mar 2010, Ask Bj?rn Hansen wrote:
I find it curious that everyone who's actually involved in syncing the files or running
mirror servers seem to think it generally sounds like a good idea and everyone who
doesn't say it's not worth the effort.
Sure, I don't run a CPAN mirror, but
On Mar 26, 2010, at 8:23 PM, Arthur Corliss wrote:
Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of
storage as part of my day job. I think it's a tad presumptuous to disregard
input just because we're not in your inner sanctum. As I mentioned in a
follow up
On Thu, Mar 25, 2010 at 11:12:32AM +, Tim Bunce wrote:
Currently on PAUSE you have to explicitly delete old uploads.
Which often is a good thing. While BACKPAN exists, it isn't somewhere
that many go to look for old distributions. For me and probably others,
BACKPAN only distributions are
On 25 Mar 2010, at 15:36, Chris Nandor wrote:
I like that solution better
[snip]
But solution to what? Are we convinced there's actually a problem here?
--
Andy Armstrong, Hexten
On Mar 25, 2010, at 10:38 AM, Andy Armstrong wrote:
But solution to what? Are we convinced there's actually a problem here?
The first two rules of optimization club:
1) You do not optimize.
2) You do not optimize without measuring.
As soon as someone can explain specifics of the problem,
61 matches
Mail list logo