Re: [RANT] Should we try to keep compatibility with old perl5s?

2018-08-12 Thread Arthur Corliss

On Sat, 11 Aug 2018, Shlomi Fish wrote:


Hi all!

This post is a little flamebait, so please try to keep the discussion civil.

Anyway, after reading the discussion in this public github issue, and following
some of the links (especially
https://szabgab.com/what-does--if-it-aint-broke-dont-fix-it--really-mean.html ),
do you think I was being unreasonable, or should I as a CPAN
author/maintainer/adopter accommodate for people running old perl5s, in my case
5.10.x and below:

https://github.com/shlomif/perl-XML-SemanticDiff/issues/3

This reminds me of what chromatic wrote here -
https://www.nntp.perl.org/group/perl.perl5.porters/2008/09/msg140206.html :

?
This is why we can't have nice things.
?

Any comments or opinions? I think I'll relax by watching a nice and fun video.


I think this begs a question: how many developers are actually testing
with those older versions?  From a purely pragmatic perspective I'd think
devs should only officially support down to revs they're actively
testing, but at the same time staying cognizant of the oldest perl revs
shipped as part of non-EOL'd Unices, etc.

Personally, I'm still supporting 5.008003, but I occasionally consider
whether newer syntactic sugars might be worth a jump.  I have to admit I
quite supporting 5.006005 just because it was getting tedious having to
maintain my own patches just to compile and install it.

Ideally, whatever your choice, a dev shipping code for the benefit of the
community shouldn't be badgered for not wanting to take on the extra
maintenance efforts.  At the same time, said dev shouldn't be surprised if
wider use of the same contributions are limited until the broader community
catches up.

Do what you want, dude.  We might not all make the same decisions, but we
all get it.

--Arthur Corliss
  Live Free or Die


Re: [cpan-questions #32443] Re: rt.cpan.org keeps logging me out.

2016-11-22 Thread Arthur Corliss

On Tue, 22 Nov 2016, Shlomi Fish wrote:


The problem is that in order to improve the security of my passwords, I
keep them all encrypted using a master password. Firefox has a built-in
feature for that and, if you don't set a master passwords then the
passwords are stored using a relatively easy-to-reverse process which every
process on the local system can use (or at least those running as the local
user). There's some old discussion of it here:

http://catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s09.html

Since my firefox password is non-trivial, entering it to fill in the
rt.cpan.org password whenever I restart firefox, restart my
https://en.wikipedia.org/wiki/X_Window_System or restart the machine (for a
new kernel, glibc, etc.) is quite a hassle. What will make my life more
tolerable would be a browser add-on that will allow me to keep the
rt.cpan.org password (and only that) unencrypted (as I already have it in
"~/.pause" anyway).


Perhaps this is just me, but there seems to be some cognitive dissonance
here.  You've clearly put some thought into the security of your passwords,
yet you're putting less thought into securing a session token?  Or you want
a plugin to bypass the normal browser key store?

Maybe I'm overthinking this.  But, then, I don't trust browsers to begin
with.  I don't want them maintaining any kind of state for me over any
significant length of time.

    --Arthur Corliss
  Live Free or Die


Re: Top level name proposal - ComputeCluster

2014-09-05 Thread Arthur Corliss

On Fri, 5 Sep 2014, James E Keenan wrote:



Could that be shortened to simply:  Cluster ?



If this happens I'm claiming Cluster::Fu... well, I think you know where I'm
going with this ;-)

--Arthur Corliss
  Live Free or Die


Kevin Johnson

2013-12-14 Thread Arthur Corliss

Greetings:

Seems like there was a rash of abandoned modules by Kevin Johnson dating 
back to the late '90s.  Many of his modules have been rehomed, but he's 
still siting a couple, including a registered name space that no code was 
ever published for.


Long story short:  does anyone know if this guy ever popped up again?  The
address info listed on his profile doesn't seem to be valid.  I'd like to
get in touch with him in regards to Net::ICAP.

  http://search.cpan.org/~kjohnson/

I appreciate your time, and any tips you may have.

--Arthur Corliss
  Live Free or Die


Re: COMAINT on https://metacpan.org/release/String-Random

2013-12-03 Thread Arthur Corliss

On Tue, 3 Dec 2013, Andr? Walker wrote:


Shlomi++

Not only was he really polite with the original author, he is also
contributing to the entire Perl community. He wasn't cocky at all! Great
example for whoever wants to contribute to an existing un-maintained module.

Cheers,
Andr?


Shmuel++, actually.  Shlomi's and CPAN's actions are well justified, no
question, but people do lose things in their inbox, or just fall off the
net for periods with plenty of legitimate reasons.

Shlomi wasn't the original author, so when the original author pipes up the
most polite action would be to inquire what resources *he's* set up, and see
how they can work in concert.

When I read Shlomi's response it definitely came off (to me, at least) like
he'd conquered the kingdom and Steve had to ask permission to get back in.

I'm sure that wasn't Shlomi's intent, but here we are on that faceless
Internet again, with none of the normal human cues to aid us.  A more
conciliary tone would have helped.

--Arthur Corliss
  Live Free or Die


Re: How to add your avatar to Google search results involving CPAN modules

2013-11-21 Thread Arthur Corliss

On Thu, 21 Nov 2013, David Cantrell wrote:


Both!

I mostly prefer search.cpan.org because I'm used to it :-) which, I admit, 
isn't a very good reason.


I'm in the same boat.  I have yet to hear of any reason compelling enough to
make me break old habits...

--Arthur Corliss
  Live Free or Die


Re: How to break apart a large distribution?

2011-10-18 Thread Arthur Corliss

On Tue, 18 Oct 2011, Fields, Christopher J wrote:


Hi,

The BioPerl core developers (including myself) have decided to work on breaking 
up the huge code base into separate distributions on CPAN, using dependencies 
to install only the needed modules (something WAY overdue).  I noticed that 
several distributions have undergone similar paths (LWP being a recent example).

Any pointers we could use?  Can this be gradually done (BioPerl is HUGE, around 
1000 modules), or should we have these all ready to go at once?


You can do it gradually, just break out the modules with no internal
dependencies first, the gradually roll up the hierarchy.  Update BioPerl
last (or note the new distributions as you issue updates for BioPerl
itself).

In short, assuming you have on-going development of all this code while
you're trying to break it all out, the gradual route is probably going to be
more manageable.

--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-09-11 Thread Arthur Corliss

On Fri, 9 Sep 2011, Aristotle Pagaltzis wrote:


Protecting your communication with another party from third
parties needs no justification whatever. It should be the assumed
default that exceptions are made from, not the exception from the
rule requiring proof.

If I?m having a massive argument with my personal foe #1, the
fact that I distrust this person on all conceivable levels does
not make you welcome to eavesdrop on the conversation.

It does not matter the very least bit how trustworthy the other
party is: uninvited third parties have no business knowing what
you do or do not say to the other party.


This is about assessment of risk, and in the example of Google that's
exactly what you're missing.  I would agree with you if your traffic was
going to a trusted party, i.e., a server under the control of entities you
know and trust.  But it's not.  So who's the greater danger to you?  The
megalith cataloging and profiling all of your communications across multiple
networks and devices, or the script kiddie at the next table?  It should be
obvious who has the greater ability to harm you.

And that's what makes so much of this thread ridiculous.  Some here are
excessively paranoid about the most peripheral and fleeting contacts, yet
don't care about the data mining operation that you're securely funneling
all your information to.  If that's what makes you paranoid, then I'd have
to say you're not paranoid enough, not by a long shot.


That?s the ?I have nothing to hide? argument.


No, read above.  It's the assessment of risk argument.  And one that's
pertinent on many, many levels.  As has been pointed out by several parties
on this list, SSL-everywhere is not a zero-cost proposition, so if you're
going to go to that length there should be tangible benefit.


It does not matter how embarrassing it is or isn?t. Irrelevant.
It?s much simpler: unless they want you to know (or it affects
you directly in some undue manner etc. ? insert reasonable
qualifiers here), you have no business knowing. How yawn-worthy
that information is makes no criterion.

The one criterion that does apply is whether making the channel
secure against you trying to find out is too expensive relative
to its sensitivity. So far, MetaCPAN seems to be less than
straining under the load, so I don?t see a justification to
reconsider.

We used to avoid SSL unless necessary because it was expensive.
I agree with the engineers who are saying that it?s time to
re-examine that as a default assumption ? whether they are
employed by Google or not makes no difference to me as far as
that statement is concerned.


Someone else pointed out that SSL is not trivial or low cost to many
embedded devices.  That's true.  I pointed out that it makes traffic shaping
and caching strategies to relieve backbone congestion extremely more
complicated.  It may be cheap for servers to terminate those connections
with the power inherent in the average modern server, but that's just
technical narcissism.  It gives no thought to the rest of us at all.


You won?t see me disagreeing that the concentration of power in
Google?s hands is dangerous. But that?s a different matter, even
though very important in its own right. Abolishing Google would
not reduce the justification to secure communications. The two
issues are independent ? so the question you pose is entirely
beside the point to the matter at hand.


I have no wish to abolish Google, and this isn't just a Google problem, it's
a social media problem, a search engine problem, it's a problem of trust
with any third party that you lack contractual safeguards or control over.

That said, I still think we should have an actual *benefit* before we slab a
dollop of SSL on everything.  I'm not opposed to metacpan having an SSL
interface, but why on earth would you place barriers to use on a public
resource, containing public information?!  That's the operators of metacpan
forcing their peculiar dogmatism and fanaticism on the rest of us.  Which I
don't actually have a problem with as long as they're not the *default* or
*only* repositories of that information.  But if they aim to become such
they need to be bent on maximum accessibility.  That's just common sense.
Explain to me why giving people a choice of interfaces is a bad thing.

SSL gives them a hard-on.  Great.  I share their preferences, but I don't
share the inclination to force it on the rest of the world.  Taken to that
extreme would have us SSL'ify content distribution networks.  And is that
friendly to the network operators carrying that traffic?  Is that what we
really want to do?  Might as well, I guess, since even that traffic has
*some* intel value.  But I would argue that the cost incurred for very
little real benefit should be considered.

--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-30 Thread Arthur Corliss

On Tue, 30 Aug 2011, sawyer x wrote:


All you had to do was originally write as much as I understand people's
desire for encryption, I still believe that 1. SSL is only necessary in
specific websites (example A, example B) and 2. when working with Google we
shouldn't be worrying about encryption there, but rather Google itself.

Instead you opted to butt heads with someone, belittling their whole SSL
doesn't have large overhead remark with who cares? Google! You could have
made an eloquent respectful comment, saying that while SSL apparently
doesn't cost much, Google is really what bothers you and that you'd rather
have a discussion about that.

I don't think anyone (including myself) would have anything bad to say about
it, and you would have been most likely successful at raising that point of
issue. I've personally moved to DuckDuckGo and considering replacing Gmail.


G I guess the little winky smiley face on my original post was lost on 
you, eh?  I shall have to be far less subtle in the future, but for now 
I'll let my e-mails stand on their own.  And I won't point out how I 
specifically requested that a Google-centric conversation should be held 
off-list...  Oops.  ;-)



Unfortunately, I've most likely committed the same belittling, whether it
was towards you, Shlomi, David, or anyone else here. So, my apologies for
this and I will be clearing my desk of this thread.


I thought that the whole thread was silly, as is the concept that metacpan
would to dictate SSL-only for questionable gains.  And I think my
interjection was pretty fair, inoffensive, and good natured.  But, maybe
quietly lurking exposes my better side.  :-)

--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-29 Thread Arthur Corliss

On Mon, 29 Aug 2011, David Nicol wrote:


I'll take this bait, swallow it, and hopefully bite off the line:

Yes, Google is going to use query data for its gain. But, Google's
business model
also involves *aggregation* and *respecting individual privacy*.

The SSL to Google Search is supposed to protect one from
eavesdropping, as has been
pointed out, by the other people in Starbucks.  And it does this.

Say you're sitting in Starbucks, searching for clues concerning an embarrassing
medical condition. Your risk is, Mallory will intercept your packets
and tell his buddies
and they will huddle and point.

If some Google tech sees your query among the millions of other queries and
points it out to /his/ buddies and they huddle and point, that doesn't
affect you the same
way, if at all. They won't be pointing at you, the victim of an
embarrassing medical
condition, they will be merely pointing at an evidence of your
existence. And such
attention might actually bring more attention, in general, to the
problem of severe
triskaidekaphobia or whatever, which would be a good thing for you --
in the aggregate.
The resulting open discussion of severe triskaidekaphobia might help
lift the crippling stigma
that has followed the victims for so long, without any unpleasant
direct confrontations.


I think you're still missing my point and focusing on defending a company
you obviously like.  Contact me off the list if you want to discuss/debate 
the actual dangers that companies like Google present.


Otherwise, let's focus on the crux of my argument:  trusting any third party
with your personal information whose primary business is selling the use of
your information is foolish, and the use of SSL as your conduit to them
should not make you feel safer.  That company is liable to be a greater
danger to your privacy than random wifi eavesdroppers.

Likewise, the use of SSL to conceal your access of highly public (and
specialized) information on metacpan also provides no tangible benefit for
90% of the users.  They should offer SSL as an option, but not mandate it
for those of us who derive no benefit from it.  Again:  a resource like
metacpan should aim for maximum accessibility...

--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-28 Thread Arthur Corliss

On Sun, 28 Aug 2011, Aristotle Pagaltzis wrote:


http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html

   In January this year (2010), Gmail switched to using HTTPS for
   everything by default. Previously it had been introduced as an
   option, but now all of our users use HTTPS to secure their email
   between their browsers and Google, all the time. In order to do
   this we had to deploy *no additional machines* and *no special
   hardware*. On our production frontend machines, SSL/TLS accounts
   for less than 1% of the CPU load, less than 10KB of memory per
   connection and less than 2% of network overhead. Many people
   believe that SSL takes a lot of CPU time and we hope the above
   numbers (public for the first time) will help to dispel that.

   If you stop reading now you only need to remember one thing:
   *SSL/TLS is not computationally expensive any more*.

   [?]

   Also, don't forget that we recently deployed encrypted web search
   on https://encrypted.google.com. Switch your search engine!


These comments are pretty funny once you consider that you're making a
secure connection to an independent party who has a commercial and
fiduciary responsibility to exploit every bit of data you give them.

With friends like Google protecting your information, who needs 
encryption?  ;-)


--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-28 Thread Arthur Corliss

On Sun, 28 Aug 2011, Aristotle Pagaltzis wrote:


Right, so just let everyone in any coffee shop or any other open network
you connect to sniff all your traffic.

Did you have an actual point?


Yep, but it appears you completely missed it.  I use encryption all the
time, but outside of authentication its merit is questionable when it
concerns information that is a) public information (especially in the
context of published open source) and b) information going to an 
untrustable third party like Google.


Personally, I have no use for metacpan, and don't care what they do.  But as
a general operating principle, I like to use the appropriate tools where
they're *appropriate*.  I expect my bank's websites to be fully SSL, I
expect my on-line brokerage's sites to be fully SSL.  But what exactly is
the risk with a search engine of a highly specialized and highly public
information?  I fail to see the benefit, and I tend towards paranoia
naturally.

OSS is about freedom  choice.  As long as users have a choice (an
alternative to metacpan) feel free to force your preferences on the users.
But in the unfortunate circumstance where metacpan becomes the only choice
it'd be nice if the maintainers try to be a little less dogmatic about it.
They should be inclined towards maximum accessibility, not maximum
pedagoguery.

I know I didn't get the memo but I think someone did claim that metacpan 
was the de facto interface these days...


--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-28 Thread Arthur Corliss

On Sun, 28 Aug 2011, sawyer x wrote:


You clearly misunderstood Aristotle. He doesn't care about a comment against
Google, and I'm sure he has no special affinity towards it. He simply had a
good remark on a discussion of the effectiveness and CPU costs of SSL
encryption and it was ignored with a completely irrelevant comment.

Google might be another Microsoft, it might be worse, but it is *irrelevant*
to the question of SSL security and the costs of enabling it by default.


My humor was perhaps too subtle, since you didn't get the relevance of my
reply.  Google switching to SSL by default is as pointless as metacpan.  In
the former case it's the protection of delivery to/from an entity that
not only doesn't have your best interest at heart, but has a business built
on exploiting *your* information for *its* benefit.  Utterly pointless.

In the latter case you have a search engine whose use is basically the
retrieval of information based on *published* open source software, and
highly published at that, given the world-wide replication of CPAN itself.
What exactly is metacpan protecting?  Is it that embarrasing that programmer
Joe can't remember what module function foo was defined in?  Can someone
really derive significant benefit from witnessing Harry browse the API for
WWW:Retrieval::LOLCats or what have you?

So, regardless of the incremental costs of implementing SSL, *why* is the
mandatory use of SSL even considered intelligent, rational, or any other
way beneficial?  I wasn't going to get involved in this thread, but the
Google bait was too spot on to ignore.

--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-28 Thread Arthur Corliss

On Sun, 28 Aug 2011, Eric Wilhelm wrote:


I didn't think it was a question of CPU speed anytime in the past
decade.  How does a proxy cache encrypted data?


Bringing up proxies is an excellent point.  While most proxies do support
SSL tunnelling, this does make the request uncacheable since the proxy never
knows anything about the connection outside of the host  port it's
tunnelling to.

I run a proxy cluster myself, and I do force caching of search engine
responses for a short window (typically on the order of a few hours), and it
does tend to pay off, especially when notable events occur in the world.
Obviously, SSL bypasses the cache altogether.  And I can only get away with
this because the businesses I support all want the same safe levels
applied to all requests, so I don't have to worry about inappropriate
content in some people's results.

Which brings to mind yet another point:  for those of us providing content
filtering services via proxies SSL is a huge problem.  The only good
solution is to do transparent interception of SSL connections with your
proxies serving up a private CA-signed certificate using wild cards, but
that requires installing your private CA's root certificate on all clients,
and even then there's clients that that still won't work on.  Never mind
that the concept of spoofing external organization certificates is 
insanely dangerous in its own right.


--Arthur Corliss
  Live Free or Die


Re: MetaCPAN is quickly becoming the de-facto interface to CPAN

2011-08-28 Thread Arthur Corliss

On Sun, 28 Aug 2011, Arthur Corliss wrote:

snip


Which brings to mind yet another point:  for those of us providing content
filtering services via proxies SSL is a huge problem.  The only good
solution is to do transparent interception of SSL connections with your
proxies serving up a private CA-signed certificate using wild cards, but
that requires installing your private CA's root certificate on all clients,
and even then there's clients that that still won't work on.  Never mind
that the concept of spoofing external organization certificates is insanely 
dangerous in its own right.


I'm going to preemptively qualify this brain dump as relevant to the
metacpan debate because I would consider metacpan's content, search results,
etc., to be highly cacheable.  Moreso than a general purpose engine like
Google, metacpan's results would tend to be more applicable to multiple
users' searches.  And yet the whole SSL-only mindset would hamper an
individual network operator's ability to control and shape its network.

Hopefully no one misconstrues this as me being against SSL sites, I'm
extremely in favor of them, particularly with organizations hosting my
sensitive information.  I only think metacpan should offer both HTTPS and
HTTP interfaces.  Let those ultra-paranoids among us use the HTTPS, and the
rest of us HTTP.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-04-02 Thread Arthur Corliss

On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:



On Apr 2, 2010, at 1:50, Arthur Corliss wrote:


And my assertion has been that the excessive stats by the server are a bigger
impediment to synchronization than the inode count.


Well, then one of us don't understand how file systems etc work.  :-)


Indeed.  If you're running UFS perhaps you might have a gripe.  But with
many filesystems in use supporting dynamic allocation groups with the inode
data stored near the actually data blocks, along with b-tree indexing, this
isn't as much of an issue for many of us.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread Arthur Corliss

On Wed, 31 Mar 2010, Ask Bj?rn Hansen wrote:

snip


Everyone who doesn't run mirrors says oh, who cares - it doesn't bother me.

Some of us who does run mirrors say actually, that sort of thing is important and 
an actual issue..

Others reply then you're doing it wrong.   But nobody came with something reality based 
that'd be right.


Some revisionist history here.  I run mirrors (not CPAN) and know full well
the limitations and inefficiencies of rsync.  To date, not one of you have
been able to refute that for this scale rsync is hurting you.  But most of
you have been obstinately against find a more efficient way of doing things.

I've made a viable suggestion, and offered some time to work on it.  But
you've made it abundantly clear that it's not welcome.


The main point here is that we can't use 20 inodes per distribution.  It's Just 
Nuts.   Sure, it's only something like 400k files/inodes now - but at the rate 
it's going it'll be a lot more soon enough.


Thats a problem, but not likely the biggest drag on server I/O you're
suffering.  Might that be ahem rsync?


HOWEVER: Right now more of those are wasted on other things (.readme files, 
symlinks, ...) -- some of which have solutions in progress already.

I don't think anyone is arguing that we NEED to delete the old distributions; 
only that they do indeed have a cost to keep around in the main CPAN.


You're right, I'm not arguing the need for the cruft.  I've only pointed out
the obvious reality that trimming files only postpones the I/O management
issues that at some time are likely going to have to be addressed, anyway.
And that you'll get less bang for the buck (or man hour) by treating the
symptoms, not the disease.

For the record:  if that's what you want to do, have at it.  Let's just not
be disingenuous about the fact that we're abrogating our responsibilities as
technologists by refusing to address the real problems and weaknesses of the
platform.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread Arthur Corliss

On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:


I can't believe I'm doing this, but ...


:-) All for entertainment's sake...


The main point here is that we can't use 20 inodes per distribution.  It's Just 
Nuts.   Sure, it's only something like 400k files/inodes now - but at the rate 
it's going it'll be a lot more soon enough.


Thats a problem, but not likely the biggest drag on server I/O you're
suffering.  Might that be ahem rsync?


That reply doesn't even make sense.


Then you've ignored most of this thread.  Inode counts themselves aren't
indicative of anything.  It's the I/O access patterns that are.  And my
assertion has been that the excessive stats by the server are a bigger
impediment to synchronization than the inode count.


You're right, I'm not arguing the need for the cruft.  I've only pointed out
the obvious reality that trimming files only postpones the I/O management
issues that at some time are likely going to have to be addressed, anyway.
And that you'll get less bang for the buck (or man hour) by treating the
symptoms, not the disease.

For the record:  if that's what you want to do, have at it.  Let's just not
be disingenuous about the fact that we're abrogating our responsibilities as
technologists by refusing to address the real problems and weaknesses of the
platform.


You are confusing we, I and you again.


Perhaps.




Yes, I (and I'm guessing everyone else who have thought about it for more than say 5 
seconds) agree that having rsync remember the file tree to save the disk IO for each sync 
sounds like an obvious solution.

But reality is more complicated.  If it was such an obviously good solution someone would 
have done it by now.  (For starters play this question: What is the kernel 
cache?).


It hasn't been done because its outside of the scope of design for rsync.
It's meant to sync arbitrary filesets in which many, if not all, changes are
made out of band.  It's decidely non-trivial to implement in that mode
unless you're willing to accept a certain window in which your database may
be out of date.

But, in a situation like PAUSE, where the avenues in which files can be
introduced into the file sets is controlled, it does become trivial.  It's
the gatekeeper, it knows who's been in or out.


Andreas' solution is much more sensible -- and as have been pointed out before 
we DO USE THAT; but the problem here is not with clients who are interested 
enough to do something special and dedicate resources to their CPAN mirroring.


By all means, I'm not opposed to any solution that actually addresses the
problem.  I don't agree that would be the fast time to implementation, but
no questions as to whether File::Rsync::Mirror::Recent would help things.
I'd support (and help) that goal.

My objections are more properly directed to those stuck on just deleting
files from the tree.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread Arthur Corliss

On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:


Talk = ZzZz.
Code = Interesting.
Deployment = Useful.


Please.  The talk serves to gauge interest before I waste any time
implementing a solution that's already been rejected out of hand.  As I've
mentioned repeatedly I already use rsync, albeit on much smaller filesets
which don't kill my servers.

So far I haven't seen much openness by those actually affected by the problem
in considering an alternative to rsync.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-03-30 Thread Arthur Corliss

On Tue, 30 Mar 2010, Matija Grabnar wrote:


Er, not exactly. Read
http://www.cvsup.org/howsofast.html


I had read  http://www.cvsup.org/faq.html#features  item #3.

From what I can see, cvsup uses the rsync algorithm on a file-by-file basis 
(it uses just the differential send part of the rsync algorithm). It doesn't 
rsync the whole tree, which was what I understood to be the original problem 
(wasn't the complaint about the flood of stats?).


Sounds like I may have interpreted the FAQ incorrectly, then.  Thanks for
pointing that out.  I have a few question, though: the explanation says:

   At the same time, the Tree Differ generates a list of the server's
   files.

That seems to infer that it's doing the exact same thing as rsync, so all 
the stats are still present on the server, right?


Nowhere do I see it mentioning that the daemon is maintaining state between
requests.  The primary speed-ups (beyond special file update handling) is
better use of bidirectional bandwidth.

Do you have access to a cvsup server so you can verify its behavior?

So if you want to make a tool that works fine for large mirrors, your 
priority apparently should be to reduce the lots of stats part which is 
used to determine exactly what files need to be considered for checking. 
(Rsync already makes sure all the *other* I/O operations are minimized).


Agreed.

Now the key, as I see it, is that unlike all the other use cases where rsync 
is used, large mirrors are likely to have their directories directly 
transfered from another mirror. So, the client that pulled the tree update 
down could store a list of changed files, and the server could then just use 
that list to determine which files
need to be synced to the downstream mirror. (Sure, the original site has to 
generate the list, but if they use a tool like PAUSE to upload the files, 
that shouldn't be hard to do).


Agreed, but I'm not sure we've gotten past the stat storm on the server,
though.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-30 Thread Arthur Corliss

On Tue, 30 Mar 2010, Rene Schickbauer wrote:

snip

This could work like any modern, distributed version control systems. That 
way, the user would also be able to apply local patches and/or deciding which 
changesets to pull in from the main server. Or have a complete, local mirror 
and one for the production systems where he/she pulls in changes after they 
have been reviewed.



NOW its time to kick my butt, if you want to.


:-) No one can accuse you of not being ambitious.  It's a neat idea, but
definitely an involved solution.  While it could solve a lot of problems I
think the human component is going to be your biggest obstacle.  As we've
seen from the reaction to the heretical notion of ditching rsync I have to
imagine getting everyone to ditch their favorite RCS tool would be even
worse.

Basically, we should just all get onboard with git (disclaimer:  I don't use
git myself, so my understanding may be deficient), a decentralized
distributed RCS.  And have developers periodically merge their branches.

Tough sell.  It probably would solve a bunch of issues, but you're treading
into vi versus emacs territory.  ;-)

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, dhu...@hudes.org wrote:


The entire point of rsync is to send only changes.
Therefore once your mirror initially syncs the old versions of modules is
not the issue. Indeed, removing the old versions would present additional
burden on synchronization! The ongoing burden is the ever-growing CPAN.


That's not entirely true, particularly when you're talking about rsync.
Remember, old synced data doesn't have to be transfered, but it still needs
to be checked for potential changes, something rsync does for every request.
That generates a crap load of I/O in the form of stats on the server.


The danger in a CPAN::Mini and in removing old versions is that one is
assuming that the latest and greatest is the one to use. This is false.
Take the case of someone running old software. I personally support
systems still running Informix Dyanmic Server 7.31 as well as systems
running the latest IDS 11.5 build. We have Perl code that talks to IDS. If
DBD::Informix withdrew support for IDS 7.31 I would need both the last
version that supported it as well as the current.  I can get away with
upgrading Perl, maybe, but to upgrade the dbms is much more problematic
(license, for one thing; SQL changes another).


This is a good example of the potentials of pruning, to be certain.  Even if
all the authors dutifully documented all the necessary scenarios that would
require pinning specific versions on CPAN it's almost guaranteed that
there's still going to be collateral damage.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, Nicholas Clark wrote:


Are you running a large public mirror site, where you don't even have
knowledge of who is mirroring from you?

(Not even knowledge, let alone channels of communication with, let alone
control over)

Because (as I see it, not having done any of this) the logistics of that is
going to have as much bearing on trying to change protocols as the actual
technical merits of the protocol itself.


I do run mirrors and am mirrored from.  Not on the scale of CPAN (in terms
of file count), but having been long aware of the effect of rsync servers I
have explored the scalability aspects of it.

It should have been obvious that trying to facilitate a cut-over to a new
syncing tool can't be done on this scale in one fell swoop.  Obviously,
there'd have to be a gradual migration where protocols are supported
concurrently, much like FTP  rsync are currently both supported.  We add a
new option and encourage people to move over.  Since we already have a list
of the public mirrors we should have some idea of where to start that
conversation.


Most of the cost of rsync is an externality to the clients. If one has an
existing mirror, one is using rsync to keep it up to date, what's the
incentive to change?


Common sense and professional courtesy.  Especially because it's likely that
some clients running public mirrors may be a sync source for some private
mirrors.  They may not feel the pain of the master repositories, but they
certainly share a portion.  And it's not likely that many mirrors have a 
capital budget to support scaling a free service, so it would be best to 
make efficient use of those resources.



I'm missing something here, I suspect. How can HTTP be more efficient than
rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to
instruct a client (such as wget) to get it all. In which case, in the course
of doing this the client is going to recurse over the entire directory tree
of the server, which, I thought, was functionally equivalent to the behaviour
of the rsync server.


You are missing something, but I may have not been explicit enough.  HTTP or
FTP can easily be the payload transport, once you know the precise files
that need to be transferred.  That is tremendously more efficient than what
rsync does on the server.  So, use rsync (or FTP mgets, etc.) to transfer
your transaction logs, compile a list of new files to retrieve, and use the
very common and low-overhead protocols to transfer the files...

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, Elaine Ashton wrote:


I do very much like Tim's proposal for giving old modules a push to BackPAN 
since, with proper communication of the changes to the authors along with a way 
to mark exceptions, this would rid CPAN of a lot of cruft that should be on 
BackPan anyway.


I'm not trying to be a dick (not intentionally, anyway), but isn't that
basically making your problem BackPan's problem?

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, Andreas J. Koenig wrote:


Says the author of a module named Paranoid. A lovely coincidence.


:-) As they say, just because you may be paranoid, it doesn't mean that no
one's out to get you.


If you want to study the CPAN checkpointed logs solution running on
the very CPAN for exactly one year now: File::Rsync::Mirror::Recent

What needs to be done is really extremely trivial: rewrite it in C and
convince the rsync people to incoude it in rsync code base. Just that.

So are you a taker, Arthur?


Heh, nice.  That sounds much more involved than my proposal, plus it leaves
us entirely at the mercy of an outside organization (the rsync folks) who
may or may not care about our needs.

I think it would be a worthy cause ultimately, but certainly a much longer
time to implementation, and considerably more effort.  Kind of sounds like
the normal stonewalling I've been getting these last few days by our
resident rsync fetishists.

Very ironic.  I use the hell out of rsync, just more discriminately that you
guys, and yet I'm public enemy number one.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, Dana Hudes wrote:


Use of wget and http to download an entire site means numerous TCP opens and 
HTTP GET requests. The entire point of rsync is that it knows there are 
numerous downloads. It does ONE open. This allows TCP slow start to ramp up


That wasn't exactly what I was suggesting.  And we'll ignore HTTP's
Keep-Alive support for the time being which negates your TCP open issue.  If
you're fetching transaction logs by which you can determine beforehand
precisely what files to retrieve HTTP or FTP will beat the pants off of
allowing rsync to tell you what you need to retrieve and delivering it.


A multi-download session with ftp is also efficient. Clients like ncftp have 
batch transfer built in. If setting up an initial mirror you might do better 
with ftp but maintaining it is where rsync rules.

I haven't looked closely but I have the impression from watching wget work that 
wget using HTTP::Date opens two TCP connections per file: it opens a socket and 
issues a r?quest for timestamp then closes it then opens a socket to issue an 
http GET if it wants the file. Then it closes that socket and the process 
repeats for next file. It keeps hoping for the timestanp even if the server 
doesn't support http::Date

Rsync and ftp are stateful; http is not. For absolute getting one file http is 
better since you skip the whole login thing and setting up data and control 
sockets.
So a CPAN client session will do better with an http mirror: it gets a tar.gz 
opens it up processes it and then goes back many seconds from original request 
for the first dependency. Repeat until entire dependency tree is completed


Dude, you definitely don't understand what we're discussing.  And neither
rsync, ftp, or http are stateful -- that's the problem.  Rsync has to
build a picture of the repositories state *per* request, even the old files
that haven't been touched in years.  It then uses that information to select
and deliver the new files you need.  Maintaining state means that you
maintain knowledge of state over time, across multiple requests.  And rsync
doesn't do that, it simulates that.  Quite cleverly, but in an very
expensive way which is borne by the server.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss

On Sun, 28 Mar 2010, Dana Hudes wrote:


Why is rsync a problem? Where is the bottleneck in the protocol or the code 
implementing it?
Specifics!
SAR is antiquated doesn't give the info you really need. Using a linux system? 
Use procallator and feed resulting collected data to ORCA. Better yet, use 
DTrace or at least truss.  Compile rsync with profiling code -- use Sun Studio 
12 it runs on Linux as well as Solaris and its a free download.


Wow.  You kids and your new shiny toys...  Look, here's a nice little
specific example for you.  I run an rsync server that contains 8,700+ files
and directories.  Now, say I want to sync a mere thirty-two new files.
Making that request on my server causes the rsync daemon to stat the entire
hierarchy to the tune of 18,000+ f  lstats.  Per request.  Freaking ouch.
And that's a tolerable use-case in my mind for rsync.  That's a hell of alot
I/O generated which would take but a couple of stats to retrieve via HTTP or
FTP.  Assuming you knew what you needed already.

Now, when you add in a file set of sufficient size to exhaust filesystem
caching, plus a crap load of concurrent requests, my archaic SAR reports
written on stone tables tend to say your I/O wait states starts pushing the
load levels unacceptably high, not to mention the pages being thrashed from
memory's cache pool, high interrupts and excessive seeks on the drives, and
so on and so forth.  sniff  Cavemen are people, too.

Now, look at the size of CPAN with *hundreds* of thousands of files.  Can
you imagine that amount of I/O *per* request?!


From a network protocol perspective rsync is quite good. If your network 
capacity is so large that it exceeds bandwidth or IOPs of your disks you 
probably can afford better disks or a more efficient disk storage layout.
Are mirrors like nic.funet.fi running multiple gigabit WAN connections?  If so 
they could sure demand stream more than a bunch of SATA2 disks can provide.

Without performance data its a waste of time to argue against rsync


And without having had examined how rsync works on both ends it should have 
been a waste of time to argue the merits of rsync.


--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Fri, 26 Mar 2010, Elaine Ashton wrote:


Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 
years which is the canonical mirror for a large number of mirrors and the 
perspective of having a few terabytes spinning in storage changes quite 
dramatically when you are actually serving a few terabytes to thousands of 
clients. CPAN grew to be quite a burden on the site not only because of the 
high demand, but also because of the multitude of small files and I'm sure 
other mirrors feel similarly burdened.


Don't be such an arrogant prick.  You guys made baseless assumptions about
people's experience with storage management in an attempt to diregard their
opinions.  That's being a dick by any metric.


The sort of pruning Tim brought up has long been an idea, but with the current 
and growing size of the archive, something does need to be done to alleviate 
the burden not only on the canonical mirrors, but also on the random folks who 
want to grab a local mirror for themselves. In my present work environment, 
12gb isn't a lot of disk space, but it's a lot considering I don't need to 
install perl modules daily and the vast majority of it I'll likely never use. 
It would be a kindness to both the mirror operators and to the end-users to 
trim it down to a manageable size.


I think I was quite explicit in saying that efficiencies should be pursued
in multiple areas, but the predominant bitch I took away from your thread
dealt with the burden of synchronizing mirrors.  What's the easiest way to
address that pain?  I don't believe it's your method.  I'd look into the
size issue *after* you address the incredible inefficiencies of a simple
rsync.


As for efficiency, rsync remains a good tool for the job that works on nearly 
every platform which is a rather tall order to match with any other solution. 
Relegating the cruft to BackPAN to make the current CPAN slimmer and less 
demanding on all fronts is an idea that would be welcomed by more than just 
mirror ops.


Rsync is an excellent tool for smaller file sets.  I use it to sync my own
mirrors, those mirrors are typically ~10k files.  Am I surprised that it
doesn't scale when you're stat'ing every single file?  No.  Which is why
alternatives should be considered.  A simple FTP client playing a
transaction log forward is trivial.

I maintain several mirrors, most with rsync.  But that's with a clear
understanding of the size of the file set.  Use the right tool for the job.
And it seems apparent to me that rsync isn't the right tool for ~200k files.


The only snag I can forsee in trimming back on the abundance of modules is the 
case where some modules have version requirements for other modules where it 
will barf with a mismatch/newer version of the required module (I bumped into 
this recently but can't remember exactly which module it was) but I think it's 
rare and the practise should be discouraged.


Try doing a simple cost-benefit analysis.  What you guys are proposing will
help.  But not as much as simpler alternatives.  Like replacing rsync with a
perl script and modifying PAUSE to log the transactions.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Nicholas Clark wrote:


I

You?

Or someone else?


I am quite happy to agree that your understanding and experience of storage
management is better than mine. But that's not the key question, in a
volunteer organisation. The questions I ask, repeating Jan's comments in
another message, are.


Oh, I understand that fully.  And I'd be happy to lend some of my time.  But
you don't make people inclined to help when people are lobbing snarky
comments like we'll wait breathlessly for you to do it.  The impression
I'm getting from most of you right now is that you're hell bent on solving
the problem your way, and no one is interested in exploring the technical
merits of other approaches.

Hell, I would even help with work towards your desired method *if* I thought
that was the consensus after a genuine exchange and consideration of ideas.
I definitely won't should it appear that we have some kind of elitist cabal
that will make their decision in isolation.  If that's going to be the case
then this should have never been raised on an open forum like the module
author's list.

Quite frankly, at times some discussions on this list fail the concept of a
technical meritocracy, and tend towards an established aristocracy.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote:

The time-honored tradition of many open source communities is to talk. And 
talk.  And talk.  The problem is that this solves nothing.  To do, does.


You are free to decide to take this as a personal insult.


I didn't take it as an insult, I took it as what it was -- a dodge.  You
already have your minds made up and are not willing to evaluate options
on their merits.

Let's just be honest about what's going on here.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Elaine Ashton wrote:


Actually, I thought I was merely offering my opinion both as the sysadmin for 
the canonical CPAN mothership and as an end-user. If that makes me a prick, 
well, I suppose I should go out and buy one :)


:-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
from multiple directions.  My point still stands, however.  Your experience,
however worthy, has zero bearing on whether or not my experience is
just as worthy.  Even moreso when you guys have zero clue who you're talking
to.  And you shouldn't have to know.  I would have thought simple communal 
and professional courtesy would be extended and all points considered in 
earnest.  Which does not appear to be the case.



And you're disregarding a considerable problem that rsync is a well-established 
tool for mirroring that is easy to use and works on a very wide range of 
platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when 
they often have several or more, likely won't be met with much enthusiasm and 
would create two tiers of CPAN mirrors, those using rsync and those not, which 
would not only complicate something which should remain simple but, again, 
doesn't address the size of the archive and the multitude of small files that 
are always a consideration no matter what you're serving them up with.


Ah, you're one of them.  All objects look like nails when all you have is a
hammer, eh?  Rsync is a good tool, but like Perl, it isn't the perfect tool
for all tasks.  You've obviously exceeded what the tool was designed for,
it's only logical to look for (or write) another tool.  Ironically, what I'm 
suggesting is so basic that rsync can be replaced by a script which will 
likely run on every mirror out there with no more fuss than rsync.



FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't 
remember the last time I even used ftp come to think of it. I had to go through 
2 layers of network red tape just to get rsync for a particular system I wanted 
to mirror CPAN to at work. Asking for FTP would have been met with a big no or 
a cackle, depending on which of the nyetwork masters got the request first.


Sounds like you may be hamstrung by your own bureacracy, but that's rarely
the case in most the places I've worked.  Not to mention that between
passive mode FTP or even using an HTTP proxy (most of which support FTP
requests) what I'm proposing is relatively painless, simple, and easy to
secure.  This concern I suspect is a non-issue for most mirror operators.
Even if it was, allow them to pull it via HTTP for all I care.  Either one
is significantly more efficient than rsync.


How is replacing rsync, a standard and widely used tool, simpler for mirror 
ops? I suppose I don't understand the opposition to trimming off the obvious 
cruft on CPAN to lighten the load when BackPAN exists to archive them. There is 
already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) 
so it's not as though lightening the load is a new idea or an unwelcome one.


I'm not opposed to trimming the cruft, but I am opposed to ignorant
knee-jerk reactions bereft of any empirical data (or at least you haven't
shared).  The cruft, while being cruft, isn't inherently evil.  You have a
basic I/O and state problem.  And the I/O generated is predominantly caused 
by rsync trying to (re)assemble state on the file set, *per* request.  More

appallingly, most of that state image being generated is state that hasn't
changed in quite awhile.  Literally years in many cases.  So why are we
wasting cycles  I/O performing massively redundant work?

That's why having PAUSE implement a transaction log, and perhaps a cron job
on the master server doing daily checkpointed file manifests is so much more
efficient.  An in-sync mirror only needs to download the lastest transaction
logs and play them forward (delete certain files, download others, etc).
And, gee, just about every author on the list could write *that* sync agent
in an evening.  Out-of-sync mirrors can start by working off the checkpoint
manifest, get what's missing, and rolling forward.

What you're overlooking is that CPAN has, and will, continue to grow.  Even 
if you remove the cruft now at some point it might grow to the same size 
just with fresh files.  When that happens, you're right back where you are 
now.  Rsync can't cut it, it wasn't designed for this.


Whether you like it or not, even on a pared down CPAN rsync is easily your
most inefficient process on the server.  If you're not willing to optimize
that, then you really don't care about optimization at all.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Arthur Corliss

On Fri, 26 Mar 2010, Ask Bj?rn Hansen wrote:


I find it curious that everyone who's actually involved in syncing the files or running 
mirror servers seem to think it generally sounds like a good idea and everyone who 
doesn't say it's not worth the effort.


Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of
storage as part of my day job.  I think it's a tad presumptuous to disregard
input just because we're not in your inner sanctum.  As I mentioned in a
follow up e-mail:  this is simply a matter of selecting the correct problem
domain.  I believe that streamlining the mirroring process will provide
greater gains for less effort.

That's not to say that pursuing other efficiencies isn't worthwhile, just
that you need to prioritize.

But what the hell do I know.  I don't run a *CPAN* mirror, so I must be
freaking clueless...

--Arthur Corliss
  Live Free or Die

Re: Module uploaded - whats next?

2009-12-09 Thread Arthur Corliss

On Mon, 7 Dec 2009, Jonathan Rockway wrote:


* On Fri, Dec 04 2009, Bill Ward wrote:

Yep, that's why I didn't use Catalyst and would never suggest it to
anyone... it's an IT nightmare.


Seriously?

My Catalyst app works as soon as I type

   # apt-get install libcatalyst-perl

on my Debian Stable system.  It isn't quite Java + WAR, but it is
also no IT nightmare.

Please get the facts straight before spreading FUD.  Your argument would
be more persuasive if it was true.


I have no beef with you Jonathan, but I think you're missing the mark, it is
a legitimate criticims.  I maintain my own distribution that I use for work
and personal use.  The reality is that I have to do that packaging work for
my distribution.  Now, bear in mind that I do use Catalyst, but it's a fact
that overly complex applications with a long list of dependencies do place a
huge burden on IT infrastructure.

It wouldn't be bad if Catalyst was the only major Perl code I had to
support, but, gee, I happen to like and use Perl almost freaking everywhere.
And when you have that many packages it's a question of *when*, not *if*, an
update to a common dependency will break one or more applications.  APIs
change, certain functionality gets deprecated, or code bases are split into
separate module namespaces, etc.  I'm not saying that this is a frequent
occurrence, but it does, has, and will happen again.

Catalyst has by far the longest list of dependencies of any Perl code I
support, bar none.  It's a very flexible and extendable framework, but it
would be naive to think that that doesn't come at a price.  And I pay that
price regularly.

Now, before I hear just use Debian, let me dispell that ignorance.  One of
the primary reasons why I maintain my own distro/packages is to make sure
that revs  APIs don't change underneath me just because some distro
developer gets a wild hair and wants to live on the bleeding edge.  That,
and to avoid them from doing stupid things like, say, lobotimzing the RNG in
openssl.

I have better things to do with my time than wonder if apt-get is going to
pull in a new magical combination of revisions that's going to break some
code I need to just work.

--Arthur Corliss
  Live Free or Die


Re: Module uploaded - whats next?

2009-12-04 Thread Arthur Corliss

On Fri, 4 Dec 2009, Dave Rolsky wrote:

The idea that you couldn't learn the basics of Catalyst and get things 
running in the same time seems unlikely.


Also, you haven't factored in all the time it's going to take you to add 
features and fix bugs already present/fixed in an existing tool.


I have no interest in any possible outcome of this discussion, but I have to
throw out a small tangential comment:  as someone who has to support a
Catalyst deployment I'd like to know if anyone here has unrolled the
ridiculously long list of dependencies necessary for Catalyst?!

As a fellow dev I struggle often with trying to achieve the balance between
not reinventing the wheel and not including the wheel that has its own
tractor trailer attached towing a mobile factory.  And as a fellow admin as
well I may have to err towards a framework that provides the core
functionality I need with the least number of moving parts.

Let the guy introduce another framework.  None of the existing frameworks
are void of any sizeable Cons.

--Arthur Corliss
  Live Free or Die


Re: Weekend entertainment

2009-11-16 Thread Arthur Corliss

On Mon, 16 Nov 2009, Jonas Br?ms? Nielsen wrote:


Aloha,

Most cultures/countries have a redneck equivalent, so it is not totally wasted 
on us non-us... anyway. I say if we have offended anybody, we remove the 
offending distribution. It is in the acme namespace, so it will not create a 
lot of fuss removing it.

The author has the choice and freedom of distributing via other channels, but 
we as a community might have to be more democratic about it, so I say it goes. 
As for the apology, I think we could just get by with a friendly and polite 
email, informing the offended person of our actions.

Going into a deeper analysis of the nature of humor and cultural differences is 
a waste of time, CPAN is an international resource and therefore we should act 
accordingly. The Acme namespace is still open and welcoming to all the crazy 
experiments, but we have to play nice at the same time.

jonasbn


Personally, I believe that anyone who's wasted more than 30 seconds on this
entire discussion should get a life.  It's more than obvious what the author
was satirizing and no one has an inalienable right to not be offended.

If you don't like it, fine, blog about it, complain to your friends.  But
let the rest of us move on with matters of some actual import.  Quit trying
to be the politically correct thought police of the world.

--Arthur Corliss
  Live Free or Die

Re: Weekend entertainment

2009-11-16 Thread Arthur Corliss

On Mon, 16 Nov 2009, Jonas Br?ms? Nielsen wrote:


Hi Arthur,

We have a saying in Danish, go for the ball, not the man


I don't disagree with your saying, but this thread needs to go away.  It
seems to be much ado about nothing by the very thin skinned.  Somebody
invoke the usenet Hitler rule quick!

I've now wasted two e-mails on this subject.  I may develop a rash.

--Arthur Corliss
  Live Free or Die

Re: Help needed testing security of login module

2009-05-20 Thread Arthur Corliss

On Wed, 20 May 2009, Bill Ward wrote:


2. Make sure to have a salt value, as it prevents the use of rainbow

tables to get a password. So you have the hash and a known salt kept
separately (the salt is plaintext), and when you check the password
you check: sha256(passphrase + salt) == sha256(passphrase_entered +
salt)


I'm not doing that, but that wouldn't be hard to add.  I didn't think that a
salt was necessary with a one-way hash.


Salts are a way of combating the use of rainbow tables, which is a database
of precomputed values within certain bounds.  Makes brute force attacks
virtually painless, becuase now it's just a lookup.  But don't add a static 
salt, that's almost as pointless as not using one at all.  If you're going 
to use salts make sure you generate a new one every time, preferrably pulling 
a few bytes from /dev/u?random or similar.


If you're really paranoid you'll also do key strengthening, similar to what
most system authentication does.  Hash with a salt, then hash the result
with the salt, repeat a few thousand times.

--Arthur Corliss
  Live Free or Die


Re: Help needed testing security of login module

2009-05-20 Thread Arthur Corliss

On Wed, 20 May 2009, Jonathan Yu wrote:


There are web sites that specialize in that sort of thing. So having a
2-byte salt can really help stop those attacks, or at least make the
amount of space needed infeasible (since every different 2 character
salt will require you to generate an entirely different rainbow
table).


16 bits of salt is roughly less than 64TB for a rainbow table that includes
all salt values.  That's doable in this day  age, I'd go at least four
bytes, if not more.  Adding a larger salt incurs virtually no penalty for
legitimate users, but makes it uneconomical for the attackers.

--Arthur Corliss
  Live Free or Die


Re: Help needed testing security of login module

2009-05-20 Thread Arthur Corliss

On Wed, 20 May 2009, Jonathan Yu wrote:


Not totally pointless, of course, because it would still require
regenerating a rainbow table versus downloading one of them already
available. On the other hand, depending how popular your application
gets, this can be dangerous -- take for example Microsoft's Lan
Manager Hash algorithm, LMHash. Even though it is a specialized
algorithm, it became popular enough to make it feasible/useful to
create and distribute rainbow tables for. So your point is valid in
that case, and it never hurts security nor is it a big deal on
performance.


I would suggest that the benefit of a static salt is marginal in best since
many of these hash algorithms aren't exactly computationally intensive on
today's hardware.  If you have a guy trying to crack passwords from a shadow
file he's only got to generate one table for all of them, versus a table per 
account.  It's an order of magnitudes more difficult in that regard,

especially if you expand the scope to all users of an application every
where.


And /dev/random can be slow, so urandom is a better suggestion, or
even better, using /dev/random to seed a random number generator
algorithm like the Mersenne Twister (which is essentially what
/dev/urandom does)


Which was why included urandom as a suggestion.

--Arthur Corliss
  Live Free or Die


Re: Help needed testing security of login module

2009-05-20 Thread Arthur Corliss

On Wed, 20 May 2009, Jonathan Yu wrote:


That's a pretty valid point. If it's a simple auth system as I
understand it, though, then the users don't have different
permissions, so there's really no point in cracking *all* of the
passwords if you can download all the data with one.


No arguments on that.  :-)  Virtual users versus real system users will
always be a weak link that can be attacked on web apps.

And Bill's right in that if someone already has your hashes they probably
already have access to the rest of it as well.  This exercise is merely
about protecting what was used to generate the hashes.  Everything else is a
separate issue.

--Arthur Corliss
  Live Free or Die


Re: a lot of controversy about Module::Build

2009-04-09 Thread Arthur Corliss

On Thu, 9 Apr 2009, Eric Wilhelm wrote:


I, as a module author providing you a free product, don't have to give
a damn. Realistically, authors give some amount of damn, but maybe
not a full I'll support Perl 5.004 for the poor slobs using ancient
Red Hat boxes.


Exactly.  If you treat Perl like a legacy language, there won't be any
new users and you won't have any problems of some new code not being
compatible with your old code because there won't be any new code.


But if you treat Perl like there's only new installations out there you're
going to be ignoring a huge installed base of older machines, and your code
won't get used.

You guys have the right to do whatever you want with your code, and I'm not
advocating that everyone should support fifteen years of Perl revisions.  I
am, however, saying that if you really want the Perl community to largely
benefit from contributions you need to be conscious of what the installed
base out there is using.

I highly doubt the majority of Perl *users* (not developers) out there are
as bleeding edge as yourselves.

--Arthur Corliss
  Live Free or Die


Re: a lot of controversy about Module::Build

2009-04-09 Thread Arthur Corliss

On Thu, 9 Apr 2009, David Cantrell wrote:


That's a hugely optimistic and naive statement, even if it's true most of
the time in the Perl community.


But lots of people who use modules from the CPAN aren't really in the
perl community, and that's important.  Actually, there are lots of
people *in* the community who don't keep their toolchain up to date and
have no idea why it might be a good idea to upgrade from the CPAN.pm
that they installed a few years ago.


I think you misread my statement.  I was saying that assuming everything
just works better because it's a newer rev is, again, optimistic and
naive.  For that reason many of us choose to lag simply to let you blokes
that like cutting yourselves on the bleeding edge sort out the conflicts
for us.  :-)  For that reason I won't adopt Eric's philosophy of wanton
upgrading.


But, anyway, is it a problem we really need to be inflicting on new Perl
users?  Do they have to care if somebody might be running 5.8.8
somewhere?  With 5.10.0 out for well over a year now?

Hell, yes, *I* care.  Developers should be aware of portability if they
expect the code to run anywhere outside of the machines they control.


Yes!

I care because not all my machines have been upgraded to 5.10.  I care
because not all the machines at work have been upgraded.  I care because
if I deliberately restrict my code to 5.10 only, then I restrict the
number of people who will be inclined to do my work for me and send
patches to fix my bugs.

A common plaint I hear about perl code *from people outside the
community* is that we have too many dependencies and our code is too
hard to install.  And I can sympathise.  If you don't know how to
configure CPAN.pm to automagically follow dependencies (incidentally,
why is the default prerequisites_policy still 'ask' and not 'follow'?)
then it's a gigantic pain in the arse.  If on top of that you want them
to *upgrade perl* they're going to think you're mad.

And we should care about people outside the community, because they
vastly outnumber those of us *in* the community.  They and their
opinions are important because they do things like influence which
technologies their employers use, and consequently how many jobs there
are for us.


Amen.  I bow to your more eloquent explanation.

--Arthur Corliss
  Live Free or Die


Re: a lot of controversy about Module::Build

2009-04-08 Thread Arthur Corliss

I hope you guys don't mind if I interject...

On Wed, 8 Apr 2009, Eric Wilhelm wrote:


That depends on who one is. ?If you're writing specifically for people
who keep their toolchain and perl religiously up-to-date,


There's nothing religious about it.  You upgrade, it works better.


That's a hugely optimistic and naive statement, even if it's true most of 
the time in the Perl community.  Regressions happen.



But, anyway, is it a problem we really need to be inflicting on new Perl
users?  Do they have to care if somebody might be running 5.8.8
somewhere?  With 5.10.0 out for well over a year now?


Hell, yes, *I* care.  Developers should be aware of portability if they
expect the code to run anywhere outside of the machines they control.  The
reality is that there are a lot of installations that lag current perl
releases by years, either because some OS versions are in maintenance-mode
only, or because many commercial Unices are always slow to upgrade.  As I
said before, regressions happen, and the bleeding edge is called
bleeding for a reason.  For those reason I still test my code back to 
Perl 5.6.x.



And anyway, if the trouble with using something is that it's not core,
the fix is not to get it into the core.  Rather, we should try to
make coreness not matter.


You're right, but you're massively oversimplifying the problem.  Practical
reality has to have influence at some point.  I still use EU::MM myself
because I know that it will work pretty much everywhere.  Not everyone is
willing (and rightfully so) to install twenty other modules just to install
and use the functionality of one.

--Arthur Corliss
  Live Free or Die

Re: Perl bug, or author bug?

2009-03-06 Thread Arthur Corliss

On Thu, 5 Mar 2009, Arthur Corliss wrote:

snip

Turns out this is a unsigned int to signed int casting problem, not a 64-bit
unclean problem.  Legitimate bug in Perl, either way, and a patch should be
submitted to the devs shortly.

--Arthur Corliss
  Live Free or Die


Perl bug, or author bug?

2009-03-05 Thread Arthur Corliss

Greetings:

I have to ask everyone here to ascertain if I'm going daft.  Here's the
situation:

32-bit Perl on 64-bit Perl platform.  getpwent(2) returns 64-bit UIDs just
fine.  getgrent(2), however, truncates them to a 32-bit value.  GIDs
returned from functions like stat(2) work just fine, however.

I'm seeing this on AIX 5.3:

# perl -e 'do { @pw = getpwent } until $pw[0] eq nobody; print $pw[2], \n;'
4294967294
# perl -e 'do { @pw = getpwent } until $pw[0] eq nobody; print $pw[3], \n;'
4294967294
# perl -e 'do { @pw = getgrent } until $pw[0] eq nobody; print $pw[2], \n;'
-2
# grep nobody /etc/passwd /etc/group
/etc/passwd:nobody:!:4294967294:4294967294::/:
/etc/group:nobody:!:4294967294:nobody,lpd

Legitimate bug in Perl?  This one is version 5.8.2, BTW.

--Arthur Corliss
  Live Free or Die


Re: Another non-free license - PerlBuildSystem

2007-02-16 Thread Arthur Corliss

On Fri, 16 Feb 2007, Ashley Pond V wrote:

If there are any law/license experts in the crowd, I'd love to see a 
formal/named/solid version of this sort of license. It's just about exactly 
what I've always wanted to put on all my own code.


What kind of idiocy is this?!  There's a *lot* of people I hate or disagree
with, but I believe in *true* freedom when I release my code.  I'm not going
to tell someone they can't use it just because of their occupation, race,
religion, or sexual orientation.

I find it particularly ironic that people are targeting some of the very
people that allow all of us to express stupid opinions safely.

People that petty have far too much time on their hands.  Just code, damn
it.

--Arthur Corliss
  Live Free or Die



Re: running tests

2004-04-02 Thread Arthur Corliss
On Fri, 2 Apr 2004, Tim Harsch wrote:

 Hi all,
 If I have several test files in my test suite, is there a way to get them to
 run in a predefined order when the user runs make test?  I realize I could
 name them alphabetically like Atest1.t, Bsometest.t, but it seems hokey
 and I'm not sure it would work on all systems.

I think a lot of us just use numeric prefixes to control the order:

  01_ini.t
  02_scalar.t
  03_list.t
  ... etc.

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto


Re: trouble with MakeMaker finding library

2004-03-31 Thread Arthur Corliss
On Wed, 31 Mar 2004, Tim Harsch wrote:

 Hi all,
 My module requires a shared library.  I'd like for the user to have that
 library available at LD_LIBRARY_PATH and call it good enough.  However, look
 at the following.  I get 'No library found' and note: both LD_LIBRARY_PATH
 is set and -L specifies the location

 ***
 use ExtUtils::MakeMaker;

 die SGE_ROOT environment variable not defined  unless my $SGE_ROOT =
 $ENV{SGE_ROOT};

 # See lib/ExtUtils/MakeMaker.pm for details of how to influence
 # the contents of the Makefile that is written.
 WriteMakefile(
 'NAME'  = 'Schedule::DRMAAc',
 'VERSION_FROM' = 'DRMAAc.pm', # finds $VERSION
 'PREREQ_PM'  = {}, # e.g., Module::Name = 1.1
 ($] = 5.005 ?## Add these new keywords supported since 5.005
   (ABSTRACT_FROM = 'DRMAAc.pm', # retrieve abstract from module
AUTHOR = 'Tim Harsch [EMAIL PROTECTED]') : ()),
 'LIBS'  = ['-ldrmaa', '-lsocket', '-lnsl', '-lm', '-lpthread'],
 'DEFINE'  = '', # e.g., '-DHAVE_SOMETHING'
  # Insert -I. if you add *.h files later:
  'INC'  = -L$SGE_ROOT/lib/sol-sparc -I$SGE_ROOT/include,

  # Un-comment this if you add C files to link with later:
  'OBJECT' = '$(O_FILES)', # link all the C files too
 );
 ***
 [868] dna:/home/harsch/CVS/managers/drmaa echo $LD_LIBRARY_PATH
 /home/harsch/CVS/managers/drmaa/Schedule:/home/harsch/tmp/SGE_040310/lib/sol
 -sparc:/usr/local/lib:/usr/local/opt/SUNWspro/lib:/opt/SUNWspro/lib:/usr/ope
 nwin/lib:/usr/dt/lib:/usr/4lib
 [869] dna:/home/harsch/CVS/managers/drmaa echo $SGE_ROOT/
 /home/harsch/tmp/SGE_040310/
 [870] dna:/home/harsch/CVS/managers/drmaa perl Makefile.PL
 Note (probably harmless): No library found for -ldrmaa
 Writing Makefile for Schedule::DRMAAc

BTW, keep in mind that your usage of LIBS will use each array member as a
complete set of arguments to ld, which means that if you had the '-ldrmaa' as
the last array member you would have never even have gotten that warning.  If
all those libraries are mandatory you need to pass them as a single array
member.  Quoting the pod:

   LIBS
 An anonymous array of alternative library specifications to be
 searched for (in order) until at least one library is found.

In any event, setting LD_LIBRARY_PATH won't override the search path when
testing for libraries, you'll need to add it to LIBS as well:

  LIBS = [-L$SGE_ROOT/lib/sol-sparc -ldrmaa, ...]

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto


Re: Namespace suggestions for new module submission (record-level transaction howto?)

2004-01-04 Thread Arthur Corliss
On Sat, 3 Jan 2004, david nicol wrote:

 I am not certain how big Sleepycat's release is any more, but
 I think a DB::Inline done with Inline.pm wrapping sleepycat code
 would be an interesting project.  That might just move the problem
 from library synchronization to making sure that everyone has access
 to a compiler though.

Agreed, that's another concern that I have, which is one of my lesser reasons
for doing a Pure Perl solution.

 I would like to hear more about the record level locking and
 transactions.  Perltie does not support these features: what will
 the interface look like?  My own efforts to do record level locking
 with DirDB from above the perltie are done by:

snip

Keep in mind that I've got a great deal of performance tuning to do, along
with some serious regression tests to make sure everything works the way that
I'm intending.  This isn't stable code yet.  In a nutshell:  I'm cheating the
system through the use of the transaction log (something that will be in use
for *every* write to the db), and I still want to preserve concurrent writes.

Outside the nutshell:  This system is nothing more than an AVL binary tree
implementation, using four files (index, key values, associative values, and
the transaction log).  The write process goes something like this:  check the
transaction log for any open transactions for the same record, write lock the
log and add the entry (concurrently executing transactions can ignore the
advisory lock to mark their transaction complete).  Update the application
blocks in the relevant files, write-locking only if the file are to be extended
(and as before, other writes that aren't extending the files can ignore the
lock).

As for transactions:  what I've described above is all I have at the moment.
Atomic record updates.  What I'd like to do at some point is add support in
the log format definition for multiple record updates, but that isn't done
yet.

Another FYI, before someone asks:  I chose four files for storage for a reason
(all reasons are influenced by my feeble-mindedness, of course).  First, I
wanted to be able to crawl/rebalance the binary tree with fixed length records
for performance reasons.  Second, having separate files for the actual values
of the keys and associative values allows me to have full binary storage
capability without worrying about special encoding tricks, etc.  Outside of my
method of tracking available slots of storage (i.e., deleted records) for
reuse, there's nothing but data in those two files, not even record
demarcation.  The transaction log, of course, speaks for itself.

Now, if someone knows a better way, I'm all ears.  :-)

 If you're making up your own file format, how about CorlissDB?

The only problem I have with that is I don't want to give the impression that
this is just another wrapper for yet another C implementation.  Many people
will assume that they'll need some libraries and end up ignoring it.

 You said support tied hashes -- Did you mean support for storing
 hash references?

Nope.  It will support the hash binding via the tie() function.  That's the
primary method of use I have for it right now.

 I added support for hash references to my DirDB (and DirDB::FTP)
 modules and would appreciate your feedback on the semantics of the
 interface.  They are as follows:

   When you store a reference to a native perl hash to DirDB,
 the hash becomes blessed so that further manipulation of the referenced
 hash manipulates the persistent store as well.

   When you store a reference to a tied hash to DirDB, you get
 a deep copy.

   When you store anything other than a scalar or an unblessed hash
 reference, the module throws a croak without overwriting or corrupting

 These semantics make it possible to do multi-level autovivification
 inside a DirDB data structure, even over the network (by FTP.)

Sounds interesting.  I haven't used that module before, but I think I'll go
download it and check it out.  I can imagine a few uses for it.  As to the
semantics, I can't speak intelligently on that until I get a fuller feel of
how the module will typically be used.

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto


Re: Namespace suggestions for new module submission

2004-01-02 Thread Arthur Corliss
On Fri, 2 Jan 2004, Mark Stosberg wrote:

 Will this be implemented with the DBI interface? Then DBD::YourProject
 seems appropriate.

 DBD::SQLite seems to be a related case, although it's not Pure Perl,
 it just allows you install it as a standard DBI driver.

I don't think it does enough to warrant inclusion in DBD::*, nor have I
planned to make it accessible via DBI.  It's just another method for
disk-based stateful hashes, like all the *DBM_File modules.  Modules like
AnyDBM_File and DB_File are causing some unpredictable results in some of my
code, depending on the version and implementation of the dbm libs they're
linked against.  This is just my way of getting predictable results without
requiring admins to upgrade or install new system libs, along with the
requisite Perl modules.

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto


Namespace suggestions for new module submission

2004-01-01 Thread Arthur Corliss
Greetings:

In the near future I'd like to submit a module for inclusion on CPAN.  I need
some advice on the appropriate namespace, however, since I don't want to
pollute top-level namespace.

Unofficial module name (as it's being developed):  PerlDBM
Synopsis:  Pure-perl implementation of a dbm engine.  Supported only on
   platforms with 64-bit filesystems.  Database files are
   portable (all data is stored in network-byte order), with
   record-level locking and transactions.  Has it's own API for
   low-level control, but also will support tied hashes.

I did notice that most of the XS wrappers for C-based implementations were all
in top-level namespace, though.  Any suggestions/preferences?

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto


Re: Submitting a new module? (Linux::ForkControl)

2003-11-13 Thread Arthur Corliss
On Thu, 13 Nov 2003, Brad Lhotsky wrote:

snip

 So I guess, two questions:

 1) Anyone see this as useful?
 2) Is 'Linux::ForkControl' a decent name for this module?

1) Yes.
2) I almost thing that a reverse would be better (i.e., ForkControl::Linux,
   or similar).  Your module could provide a generic interface (along with
   a working Linux implementation), and others could contribute other
   platform implementations.  I would be interested in implementations on
   AIX, IRIX, and Solaris, personally.  If I can catch a few projects up
   to date, I'd contribute the modules myself.

As an addendum, I think it would be useful to be able to differentiate between
CPU load and memory load, placing limits on both.

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto




Re: RFC: SQL::ExportDB

2003-09-24 Thread Arthur Corliss
On Wed, 24 Sep 2003, Michael A Nachbaur wrote:

 It sucks being in the latest timezone in North America; it makes it that much
 more difficult to catch up on all the mailing list traffic.

Latest timezone, where are you?  I'm in Alaska, I thought I was in the latest
timezone in NA.  ;-)

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto



Re: RFC: SQL::ExportDB

2003-09-24 Thread Arthur Corliss
On Wed, 24 Sep 2003, Michael A Nachbaur wrote:


 Yes, but Alaska doesn't actually count, you see.  They always draw it on maps
 sort of floating next to California, right above Hawaii which, curiously
 enough, is rougly the same size as Alaska.  Odd, that.  ;-)

LOL.  You've got me there.  As soon as we find a stretch of coastline that
matches our eastern border we'll dock and revisit this discussion.  I hear the
fault lines in CA would be a good match if you'd hurry up and drop into the
ocean.  ;-)

--Arthur Corliss
  Bolverk's Lair -- http://arthur.corlissfamily.org/
  Digital Mages -- http://www.digitalmages.com/
  Live Free or Die, the Only Way to Live -- NH State Motto