Re: [RANT] Should we try to keep compatibility with old perl5s?
On Sat, 11 Aug 2018, Shlomi Fish wrote: Hi all! This post is a little flamebait, so please try to keep the discussion civil. Anyway, after reading the discussion in this public github issue, and following some of the links (especially https://szabgab.com/what-does--if-it-aint-broke-dont-fix-it--really-mean.html ), do you think I was being unreasonable, or should I as a CPAN author/maintainer/adopter accommodate for people running old perl5s, in my case 5.10.x and below: https://github.com/shlomif/perl-XML-SemanticDiff/issues/3 This reminds me of what chromatic wrote here - https://www.nntp.perl.org/group/perl.perl5.porters/2008/09/msg140206.html : ? This is why we can't have nice things. ? Any comments or opinions? I think I'll relax by watching a nice and fun video. I think this begs a question: how many developers are actually testing with those older versions? From a purely pragmatic perspective I'd think devs should only officially support down to revs they're actively testing, but at the same time staying cognizant of the oldest perl revs shipped as part of non-EOL'd Unices, etc. Personally, I'm still supporting 5.008003, but I occasionally consider whether newer syntactic sugars might be worth a jump. I have to admit I quite supporting 5.006005 just because it was getting tedious having to maintain my own patches just to compile and install it. Ideally, whatever your choice, a dev shipping code for the benefit of the community shouldn't be badgered for not wanting to take on the extra maintenance efforts. At the same time, said dev shouldn't be surprised if wider use of the same contributions are limited until the broader community catches up. Do what you want, dude. We might not all make the same decisions, but we all get it. --Arthur Corliss Live Free or Die
Re: [cpan-questions #32443] Re: rt.cpan.org keeps logging me out.
On Tue, 22 Nov 2016, Shlomi Fish wrote: The problem is that in order to improve the security of my passwords, I keep them all encrypted using a master password. Firefox has a built-in feature for that and, if you don't set a master passwords then the passwords are stored using a relatively easy-to-reverse process which every process on the local system can use (or at least those running as the local user). There's some old discussion of it here: http://catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s09.html Since my firefox password is non-trivial, entering it to fill in the rt.cpan.org password whenever I restart firefox, restart my https://en.wikipedia.org/wiki/X_Window_System or restart the machine (for a new kernel, glibc, etc.) is quite a hassle. What will make my life more tolerable would be a browser add-on that will allow me to keep the rt.cpan.org password (and only that) unencrypted (as I already have it in "~/.pause" anyway). Perhaps this is just me, but there seems to be some cognitive dissonance here. You've clearly put some thought into the security of your passwords, yet you're putting less thought into securing a session token? Or you want a plugin to bypass the normal browser key store? Maybe I'm overthinking this. But, then, I don't trust browsers to begin with. I don't want them maintaining any kind of state for me over any significant length of time. --Arthur Corliss Live Free or Die
Re: Top level name proposal - ComputeCluster
On Fri, 5 Sep 2014, James E Keenan wrote: Could that be shortened to simply: Cluster ? If this happens I'm claiming Cluster::Fu... well, I think you know where I'm going with this ;-) --Arthur Corliss Live Free or Die
Kevin Johnson
Greetings: Seems like there was a rash of abandoned modules by Kevin Johnson dating back to the late '90s. Many of his modules have been rehomed, but he's still siting a couple, including a registered name space that no code was ever published for. Long story short: does anyone know if this guy ever popped up again? The address info listed on his profile doesn't seem to be valid. I'd like to get in touch with him in regards to Net::ICAP. http://search.cpan.org/~kjohnson/ I appreciate your time, and any tips you may have. --Arthur Corliss Live Free or Die
Re: COMAINT on https://metacpan.org/release/String-Random
On Tue, 3 Dec 2013, Andr? Walker wrote: Shlomi++ Not only was he really polite with the original author, he is also contributing to the entire Perl community. He wasn't cocky at all! Great example for whoever wants to contribute to an existing un-maintained module. Cheers, Andr? Shmuel++, actually. Shlomi's and CPAN's actions are well justified, no question, but people do lose things in their inbox, or just fall off the net for periods with plenty of legitimate reasons. Shlomi wasn't the original author, so when the original author pipes up the most polite action would be to inquire what resources *he's* set up, and see how they can work in concert. When I read Shlomi's response it definitely came off (to me, at least) like he'd conquered the kingdom and Steve had to ask permission to get back in. I'm sure that wasn't Shlomi's intent, but here we are on that faceless Internet again, with none of the normal human cues to aid us. A more conciliary tone would have helped. --Arthur Corliss Live Free or Die
Re: How to add your avatar to Google search results involving CPAN modules
On Thu, 21 Nov 2013, David Cantrell wrote: Both! I mostly prefer search.cpan.org because I'm used to it :-) which, I admit, isn't a very good reason. I'm in the same boat. I have yet to hear of any reason compelling enough to make me break old habits... --Arthur Corliss Live Free or Die
Re: How to break apart a large distribution?
On Tue, 18 Oct 2011, Fields, Christopher J wrote: Hi, The BioPerl core developers (including myself) have decided to work on breaking up the huge code base into separate distributions on CPAN, using dependencies to install only the needed modules (something WAY overdue). I noticed that several distributions have undergone similar paths (LWP being a recent example). Any pointers we could use? Can this be gradually done (BioPerl is HUGE, around 1000 modules), or should we have these all ready to go at once? You can do it gradually, just break out the modules with no internal dependencies first, the gradually roll up the hierarchy. Update BioPerl last (or note the new distributions as you issue updates for BioPerl itself). In short, assuming you have on-going development of all this code while you're trying to break it all out, the gradual route is probably going to be more manageable. --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Fri, 9 Sep 2011, Aristotle Pagaltzis wrote: Protecting your communication with another party from third parties needs no justification whatever. It should be the assumed default that exceptions are made from, not the exception from the rule requiring proof. If I?m having a massive argument with my personal foe #1, the fact that I distrust this person on all conceivable levels does not make you welcome to eavesdrop on the conversation. It does not matter the very least bit how trustworthy the other party is: uninvited third parties have no business knowing what you do or do not say to the other party. This is about assessment of risk, and in the example of Google that's exactly what you're missing. I would agree with you if your traffic was going to a trusted party, i.e., a server under the control of entities you know and trust. But it's not. So who's the greater danger to you? The megalith cataloging and profiling all of your communications across multiple networks and devices, or the script kiddie at the next table? It should be obvious who has the greater ability to harm you. And that's what makes so much of this thread ridiculous. Some here are excessively paranoid about the most peripheral and fleeting contacts, yet don't care about the data mining operation that you're securely funneling all your information to. If that's what makes you paranoid, then I'd have to say you're not paranoid enough, not by a long shot. That?s the ?I have nothing to hide? argument. No, read above. It's the assessment of risk argument. And one that's pertinent on many, many levels. As has been pointed out by several parties on this list, SSL-everywhere is not a zero-cost proposition, so if you're going to go to that length there should be tangible benefit. It does not matter how embarrassing it is or isn?t. Irrelevant. It?s much simpler: unless they want you to know (or it affects you directly in some undue manner etc. ? insert reasonable qualifiers here), you have no business knowing. How yawn-worthy that information is makes no criterion. The one criterion that does apply is whether making the channel secure against you trying to find out is too expensive relative to its sensitivity. So far, MetaCPAN seems to be less than straining under the load, so I don?t see a justification to reconsider. We used to avoid SSL unless necessary because it was expensive. I agree with the engineers who are saying that it?s time to re-examine that as a default assumption ? whether they are employed by Google or not makes no difference to me as far as that statement is concerned. Someone else pointed out that SSL is not trivial or low cost to many embedded devices. That's true. I pointed out that it makes traffic shaping and caching strategies to relieve backbone congestion extremely more complicated. It may be cheap for servers to terminate those connections with the power inherent in the average modern server, but that's just technical narcissism. It gives no thought to the rest of us at all. You won?t see me disagreeing that the concentration of power in Google?s hands is dangerous. But that?s a different matter, even though very important in its own right. Abolishing Google would not reduce the justification to secure communications. The two issues are independent ? so the question you pose is entirely beside the point to the matter at hand. I have no wish to abolish Google, and this isn't just a Google problem, it's a social media problem, a search engine problem, it's a problem of trust with any third party that you lack contractual safeguards or control over. That said, I still think we should have an actual *benefit* before we slab a dollop of SSL on everything. I'm not opposed to metacpan having an SSL interface, but why on earth would you place barriers to use on a public resource, containing public information?! That's the operators of metacpan forcing their peculiar dogmatism and fanaticism on the rest of us. Which I don't actually have a problem with as long as they're not the *default* or *only* repositories of that information. But if they aim to become such they need to be bent on maximum accessibility. That's just common sense. Explain to me why giving people a choice of interfaces is a bad thing. SSL gives them a hard-on. Great. I share their preferences, but I don't share the inclination to force it on the rest of the world. Taken to that extreme would have us SSL'ify content distribution networks. And is that friendly to the network operators carrying that traffic? Is that what we really want to do? Might as well, I guess, since even that traffic has *some* intel value. But I would argue that the cost incurred for very little real benefit should be considered. --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Tue, 30 Aug 2011, sawyer x wrote: All you had to do was originally write as much as I understand people's desire for encryption, I still believe that 1. SSL is only necessary in specific websites (example A, example B) and 2. when working with Google we shouldn't be worrying about encryption there, but rather Google itself. Instead you opted to butt heads with someone, belittling their whole SSL doesn't have large overhead remark with who cares? Google! You could have made an eloquent respectful comment, saying that while SSL apparently doesn't cost much, Google is really what bothers you and that you'd rather have a discussion about that. I don't think anyone (including myself) would have anything bad to say about it, and you would have been most likely successful at raising that point of issue. I've personally moved to DuckDuckGo and considering replacing Gmail. G I guess the little winky smiley face on my original post was lost on you, eh? I shall have to be far less subtle in the future, but for now I'll let my e-mails stand on their own. And I won't point out how I specifically requested that a Google-centric conversation should be held off-list... Oops. ;-) Unfortunately, I've most likely committed the same belittling, whether it was towards you, Shlomi, David, or anyone else here. So, my apologies for this and I will be clearing my desk of this thread. I thought that the whole thread was silly, as is the concept that metacpan would to dictate SSL-only for questionable gains. And I think my interjection was pretty fair, inoffensive, and good natured. But, maybe quietly lurking exposes my better side. :-) --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Mon, 29 Aug 2011, David Nicol wrote: I'll take this bait, swallow it, and hopefully bite off the line: Yes, Google is going to use query data for its gain. But, Google's business model also involves *aggregation* and *respecting individual privacy*. The SSL to Google Search is supposed to protect one from eavesdropping, as has been pointed out, by the other people in Starbucks. And it does this. Say you're sitting in Starbucks, searching for clues concerning an embarrassing medical condition. Your risk is, Mallory will intercept your packets and tell his buddies and they will huddle and point. If some Google tech sees your query among the millions of other queries and points it out to /his/ buddies and they huddle and point, that doesn't affect you the same way, if at all. They won't be pointing at you, the victim of an embarrassing medical condition, they will be merely pointing at an evidence of your existence. And such attention might actually bring more attention, in general, to the problem of severe triskaidekaphobia or whatever, which would be a good thing for you -- in the aggregate. The resulting open discussion of severe triskaidekaphobia might help lift the crippling stigma that has followed the victims for so long, without any unpleasant direct confrontations. I think you're still missing my point and focusing on defending a company you obviously like. Contact me off the list if you want to discuss/debate the actual dangers that companies like Google present. Otherwise, let's focus on the crux of my argument: trusting any third party with your personal information whose primary business is selling the use of your information is foolish, and the use of SSL as your conduit to them should not make you feel safer. That company is liable to be a greater danger to your privacy than random wifi eavesdroppers. Likewise, the use of SSL to conceal your access of highly public (and specialized) information on metacpan also provides no tangible benefit for 90% of the users. They should offer SSL as an option, but not mandate it for those of us who derive no benefit from it. Again: a resource like metacpan should aim for maximum accessibility... --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Sun, 28 Aug 2011, Aristotle Pagaltzis wrote: http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy *no additional machines* and *no special hardware*. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that. If you stop reading now you only need to remember one thing: *SSL/TLS is not computationally expensive any more*. [?] Also, don't forget that we recently deployed encrypted web search on https://encrypted.google.com. Switch your search engine! These comments are pretty funny once you consider that you're making a secure connection to an independent party who has a commercial and fiduciary responsibility to exploit every bit of data you give them. With friends like Google protecting your information, who needs encryption? ;-) --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Sun, 28 Aug 2011, Aristotle Pagaltzis wrote: Right, so just let everyone in any coffee shop or any other open network you connect to sniff all your traffic. Did you have an actual point? Yep, but it appears you completely missed it. I use encryption all the time, but outside of authentication its merit is questionable when it concerns information that is a) public information (especially in the context of published open source) and b) information going to an untrustable third party like Google. Personally, I have no use for metacpan, and don't care what they do. But as a general operating principle, I like to use the appropriate tools where they're *appropriate*. I expect my bank's websites to be fully SSL, I expect my on-line brokerage's sites to be fully SSL. But what exactly is the risk with a search engine of a highly specialized and highly public information? I fail to see the benefit, and I tend towards paranoia naturally. OSS is about freedom choice. As long as users have a choice (an alternative to metacpan) feel free to force your preferences on the users. But in the unfortunate circumstance where metacpan becomes the only choice it'd be nice if the maintainers try to be a little less dogmatic about it. They should be inclined towards maximum accessibility, not maximum pedagoguery. I know I didn't get the memo but I think someone did claim that metacpan was the de facto interface these days... --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Sun, 28 Aug 2011, sawyer x wrote: You clearly misunderstood Aristotle. He doesn't care about a comment against Google, and I'm sure he has no special affinity towards it. He simply had a good remark on a discussion of the effectiveness and CPU costs of SSL encryption and it was ignored with a completely irrelevant comment. Google might be another Microsoft, it might be worse, but it is *irrelevant* to the question of SSL security and the costs of enabling it by default. My humor was perhaps too subtle, since you didn't get the relevance of my reply. Google switching to SSL by default is as pointless as metacpan. In the former case it's the protection of delivery to/from an entity that not only doesn't have your best interest at heart, but has a business built on exploiting *your* information for *its* benefit. Utterly pointless. In the latter case you have a search engine whose use is basically the retrieval of information based on *published* open source software, and highly published at that, given the world-wide replication of CPAN itself. What exactly is metacpan protecting? Is it that embarrasing that programmer Joe can't remember what module function foo was defined in? Can someone really derive significant benefit from witnessing Harry browse the API for WWW:Retrieval::LOLCats or what have you? So, regardless of the incremental costs of implementing SSL, *why* is the mandatory use of SSL even considered intelligent, rational, or any other way beneficial? I wasn't going to get involved in this thread, but the Google bait was too spot on to ignore. --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Sun, 28 Aug 2011, Eric Wilhelm wrote: I didn't think it was a question of CPU speed anytime in the past decade. How does a proxy cache encrypted data? Bringing up proxies is an excellent point. While most proxies do support SSL tunnelling, this does make the request uncacheable since the proxy never knows anything about the connection outside of the host port it's tunnelling to. I run a proxy cluster myself, and I do force caching of search engine responses for a short window (typically on the order of a few hours), and it does tend to pay off, especially when notable events occur in the world. Obviously, SSL bypasses the cache altogether. And I can only get away with this because the businesses I support all want the same safe levels applied to all requests, so I don't have to worry about inappropriate content in some people's results. Which brings to mind yet another point: for those of us providing content filtering services via proxies SSL is a huge problem. The only good solution is to do transparent interception of SSL connections with your proxies serving up a private CA-signed certificate using wild cards, but that requires installing your private CA's root certificate on all clients, and even then there's clients that that still won't work on. Never mind that the concept of spoofing external organization certificates is insanely dangerous in its own right. --Arthur Corliss Live Free or Die
Re: MetaCPAN is quickly becoming the de-facto interface to CPAN
On Sun, 28 Aug 2011, Arthur Corliss wrote: snip Which brings to mind yet another point: for those of us providing content filtering services via proxies SSL is a huge problem. The only good solution is to do transparent interception of SSL connections with your proxies serving up a private CA-signed certificate using wild cards, but that requires installing your private CA's root certificate on all clients, and even then there's clients that that still won't work on. Never mind that the concept of spoofing external organization certificates is insanely dangerous in its own right. I'm going to preemptively qualify this brain dump as relevant to the metacpan debate because I would consider metacpan's content, search results, etc., to be highly cacheable. Moreso than a general purpose engine like Google, metacpan's results would tend to be more applicable to multiple users' searches. And yet the whole SSL-only mindset would hamper an individual network operator's ability to control and shape its network. Hopefully no one misconstrues this as me being against SSL sites, I'm extremely in favor of them, particularly with organizations hosting my sensitive information. I only think metacpan should offer both HTTPS and HTTP interfaces. Let those ultra-paranoids among us use the HTTPS, and the rest of us HTTP. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote: On Apr 2, 2010, at 1:50, Arthur Corliss wrote: And my assertion has been that the excessive stats by the server are a bigger impediment to synchronization than the inode count. Well, then one of us don't understand how file systems etc work. :-) Indeed. If you're running UFS perhaps you might have a gripe. But with many filesystems in use supporting dynamic allocation groups with the inode data stored near the actually data blocks, along with b-tree indexing, this isn't as much of an issue for many of us. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Wed, 31 Mar 2010, Ask Bj?rn Hansen wrote: snip Everyone who doesn't run mirrors says oh, who cares - it doesn't bother me. Some of us who does run mirrors say actually, that sort of thing is important and an actual issue.. Others reply then you're doing it wrong. But nobody came with something reality based that'd be right. Some revisionist history here. I run mirrors (not CPAN) and know full well the limitations and inefficiencies of rsync. To date, not one of you have been able to refute that for this scale rsync is hurting you. But most of you have been obstinately against find a more efficient way of doing things. I've made a viable suggestion, and offered some time to work on it. But you've made it abundantly clear that it's not welcome. The main point here is that we can't use 20 inodes per distribution. It's Just Nuts. Sure, it's only something like 400k files/inodes now - but at the rate it's going it'll be a lot more soon enough. Thats a problem, but not likely the biggest drag on server I/O you're suffering. Might that be ahem rsync? HOWEVER: Right now more of those are wasted on other things (.readme files, symlinks, ...) -- some of which have solutions in progress already. I don't think anyone is arguing that we NEED to delete the old distributions; only that they do indeed have a cost to keep around in the main CPAN. You're right, I'm not arguing the need for the cruft. I've only pointed out the obvious reality that trimming files only postpones the I/O management issues that at some time are likely going to have to be addressed, anyway. And that you'll get less bang for the buck (or man hour) by treating the symptoms, not the disease. For the record: if that's what you want to do, have at it. Let's just not be disingenuous about the fact that we're abrogating our responsibilities as technologists by refusing to address the real problems and weaknesses of the platform. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote: I can't believe I'm doing this, but ... :-) All for entertainment's sake... The main point here is that we can't use 20 inodes per distribution. It's Just Nuts. Sure, it's only something like 400k files/inodes now - but at the rate it's going it'll be a lot more soon enough. Thats a problem, but not likely the biggest drag on server I/O you're suffering. Might that be ahem rsync? That reply doesn't even make sense. Then you've ignored most of this thread. Inode counts themselves aren't indicative of anything. It's the I/O access patterns that are. And my assertion has been that the excessive stats by the server are a bigger impediment to synchronization than the inode count. You're right, I'm not arguing the need for the cruft. I've only pointed out the obvious reality that trimming files only postpones the I/O management issues that at some time are likely going to have to be addressed, anyway. And that you'll get less bang for the buck (or man hour) by treating the symptoms, not the disease. For the record: if that's what you want to do, have at it. Let's just not be disingenuous about the fact that we're abrogating our responsibilities as technologists by refusing to address the real problems and weaknesses of the platform. You are confusing we, I and you again. Perhaps. Yes, I (and I'm guessing everyone else who have thought about it for more than say 5 seconds) agree that having rsync remember the file tree to save the disk IO for each sync sounds like an obvious solution. But reality is more complicated. If it was such an obviously good solution someone would have done it by now. (For starters play this question: What is the kernel cache?). It hasn't been done because its outside of the scope of design for rsync. It's meant to sync arbitrary filesets in which many, if not all, changes are made out of band. It's decidely non-trivial to implement in that mode unless you're willing to accept a certain window in which your database may be out of date. But, in a situation like PAUSE, where the avenues in which files can be introduced into the file sets is controlled, it does become trivial. It's the gatekeeper, it knows who's been in or out. Andreas' solution is much more sensible -- and as have been pointed out before we DO USE THAT; but the problem here is not with clients who are interested enough to do something special and dedicate resources to their CPAN mirroring. By all means, I'm not opposed to any solution that actually addresses the problem. I don't agree that would be the fast time to implementation, but no questions as to whether File::Rsync::Mirror::Recent would help things. I'd support (and help) that goal. My objections are more properly directed to those stuck on just deleting files from the tree. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote: Talk = ZzZz. Code = Interesting. Deployment = Useful. Please. The talk serves to gauge interest before I waste any time implementing a solution that's already been rejected out of hand. As I've mentioned repeatedly I already use rsync, albeit on much smaller filesets which don't kill my servers. So far I haven't seen much openness by those actually affected by the problem in considering an alternative to rsync. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Tue, 30 Mar 2010, Matija Grabnar wrote: Er, not exactly. Read http://www.cvsup.org/howsofast.html I had read http://www.cvsup.org/faq.html#features item #3. From what I can see, cvsup uses the rsync algorithm on a file-by-file basis (it uses just the differential send part of the rsync algorithm). It doesn't rsync the whole tree, which was what I understood to be the original problem (wasn't the complaint about the flood of stats?). Sounds like I may have interpreted the FAQ incorrectly, then. Thanks for pointing that out. I have a few question, though: the explanation says: At the same time, the Tree Differ generates a list of the server's files. That seems to infer that it's doing the exact same thing as rsync, so all the stats are still present on the server, right? Nowhere do I see it mentioning that the daemon is maintaining state between requests. The primary speed-ups (beyond special file update handling) is better use of bidirectional bandwidth. Do you have access to a cvsup server so you can verify its behavior? So if you want to make a tool that works fine for large mirrors, your priority apparently should be to reduce the lots of stats part which is used to determine exactly what files need to be considered for checking. (Rsync already makes sure all the *other* I/O operations are minimized). Agreed. Now the key, as I see it, is that unlike all the other use cases where rsync is used, large mirrors are likely to have their directories directly transfered from another mirror. So, the client that pulled the tree update down could store a list of changed files, and the server could then just use that list to determine which files need to be synced to the downstream mirror. (Sure, the original site has to generate the list, but if they use a tool like PAUSE to upload the files, that shouldn't be hard to do). Agreed, but I'm not sure we've gotten past the stat storm on the server, though. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Tue, 30 Mar 2010, Rene Schickbauer wrote: snip This could work like any modern, distributed version control systems. That way, the user would also be able to apply local patches and/or deciding which changesets to pull in from the main server. Or have a complete, local mirror and one for the production systems where he/she pulls in changes after they have been reviewed. NOW its time to kick my butt, if you want to. :-) No one can accuse you of not being ambitious. It's a neat idea, but definitely an involved solution. While it could solve a lot of problems I think the human component is going to be your biggest obstacle. As we've seen from the reaction to the heretical notion of ditching rsync I have to imagine getting everyone to ditch their favorite RCS tool would be even worse. Basically, we should just all get onboard with git (disclaimer: I don't use git myself, so my understanding may be deficient), a decentralized distributed RCS. And have developers periodically merge their branches. Tough sell. It probably would solve a bunch of issues, but you're treading into vi versus emacs territory. ;-) --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, dhu...@hudes.org wrote: The entire point of rsync is to send only changes. Therefore once your mirror initially syncs the old versions of modules is not the issue. Indeed, removing the old versions would present additional burden on synchronization! The ongoing burden is the ever-growing CPAN. That's not entirely true, particularly when you're talking about rsync. Remember, old synced data doesn't have to be transfered, but it still needs to be checked for potential changes, something rsync does for every request. That generates a crap load of I/O in the form of stats on the server. The danger in a CPAN::Mini and in removing old versions is that one is assuming that the latest and greatest is the one to use. This is false. Take the case of someone running old software. I personally support systems still running Informix Dyanmic Server 7.31 as well as systems running the latest IDS 11.5 build. We have Perl code that talks to IDS. If DBD::Informix withdrew support for IDS 7.31 I would need both the last version that supported it as well as the current. I can get away with upgrading Perl, maybe, but to upgrade the dbms is much more problematic (license, for one thing; SQL changes another). This is a good example of the potentials of pruning, to be certain. Even if all the authors dutifully documented all the necessary scenarios that would require pinning specific versions on CPAN it's almost guaranteed that there's still going to be collateral damage. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Nicholas Clark wrote: Are you running a large public mirror site, where you don't even have knowledge of who is mirroring from you? (Not even knowledge, let alone channels of communication with, let alone control over) Because (as I see it, not having done any of this) the logistics of that is going to have as much bearing on trying to change protocols as the actual technical merits of the protocol itself. I do run mirrors and am mirrored from. Not on the scale of CPAN (in terms of file count), but having been long aware of the effect of rsync servers I have explored the scalability aspects of it. It should have been obvious that trying to facilitate a cut-over to a new syncing tool can't be done on this scale in one fell swoop. Obviously, there'd have to be a gradual migration where protocols are supported concurrently, much like FTP rsync are currently both supported. We add a new option and encourage people to move over. Since we already have a list of the public mirrors we should have some idea of where to start that conversation. Most of the cost of rsync is an externality to the clients. If one has an existing mirror, one is using rsync to keep it up to date, what's the incentive to change? Common sense and professional courtesy. Especially because it's likely that some clients running public mirrors may be a sync source for some private mirrors. They may not feel the pain of the master repositories, but they certainly share a portion. And it's not likely that many mirrors have a capital budget to support scaling a free service, so it would be best to make efficient use of those resources. I'm missing something here, I suspect. How can HTTP be more efficient than rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to instruct a client (such as wget) to get it all. In which case, in the course of doing this the client is going to recurse over the entire directory tree of the server, which, I thought, was functionally equivalent to the behaviour of the rsync server. You are missing something, but I may have not been explicit enough. HTTP or FTP can easily be the payload transport, once you know the precise files that need to be transferred. That is tremendously more efficient than what rsync does on the server. So, use rsync (or FTP mgets, etc.) to transfer your transaction logs, compile a list of new files to retrieve, and use the very common and low-overhead protocols to transfer the files... --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Elaine Ashton wrote: I do very much like Tim's proposal for giving old modules a push to BackPAN since, with proper communication of the changes to the authors along with a way to mark exceptions, this would rid CPAN of a lot of cruft that should be on BackPan anyway. I'm not trying to be a dick (not intentionally, anyway), but isn't that basically making your problem BackPan's problem? --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Andreas J. Koenig wrote: Says the author of a module named Paranoid. A lovely coincidence. :-) As they say, just because you may be paranoid, it doesn't mean that no one's out to get you. If you want to study the CPAN checkpointed logs solution running on the very CPAN for exactly one year now: File::Rsync::Mirror::Recent What needs to be done is really extremely trivial: rewrite it in C and convince the rsync people to incoude it in rsync code base. Just that. So are you a taker, Arthur? Heh, nice. That sounds much more involved than my proposal, plus it leaves us entirely at the mercy of an outside organization (the rsync folks) who may or may not care about our needs. I think it would be a worthy cause ultimately, but certainly a much longer time to implementation, and considerably more effort. Kind of sounds like the normal stonewalling I've been getting these last few days by our resident rsync fetishists. Very ironic. I use the hell out of rsync, just more discriminately that you guys, and yet I'm public enemy number one. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Dana Hudes wrote: Use of wget and http to download an entire site means numerous TCP opens and HTTP GET requests. The entire point of rsync is that it knows there are numerous downloads. It does ONE open. This allows TCP slow start to ramp up That wasn't exactly what I was suggesting. And we'll ignore HTTP's Keep-Alive support for the time being which negates your TCP open issue. If you're fetching transaction logs by which you can determine beforehand precisely what files to retrieve HTTP or FTP will beat the pants off of allowing rsync to tell you what you need to retrieve and delivering it. A multi-download session with ftp is also efficient. Clients like ncftp have batch transfer built in. If setting up an initial mirror you might do better with ftp but maintaining it is where rsync rules. I haven't looked closely but I have the impression from watching wget work that wget using HTTP::Date opens two TCP connections per file: it opens a socket and issues a r?quest for timestamp then closes it then opens a socket to issue an http GET if it wants the file. Then it closes that socket and the process repeats for next file. It keeps hoping for the timestanp even if the server doesn't support http::Date Rsync and ftp are stateful; http is not. For absolute getting one file http is better since you skip the whole login thing and setting up data and control sockets. So a CPAN client session will do better with an http mirror: it gets a tar.gz opens it up processes it and then goes back many seconds from original request for the first dependency. Repeat until entire dependency tree is completed Dude, you definitely don't understand what we're discussing. And neither rsync, ftp, or http are stateful -- that's the problem. Rsync has to build a picture of the repositories state *per* request, even the old files that haven't been touched in years. It then uses that information to select and deliver the new files you need. Maintaining state means that you maintain knowledge of state over time, across multiple requests. And rsync doesn't do that, it simulates that. Quite cleverly, but in an very expensive way which is borne by the server. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Dana Hudes wrote: Why is rsync a problem? Where is the bottleneck in the protocol or the code implementing it? Specifics! SAR is antiquated doesn't give the info you really need. Using a linux system? Use procallator and feed resulting collected data to ORCA. Better yet, use DTrace or at least truss. Compile rsync with profiling code -- use Sun Studio 12 it runs on Linux as well as Solaris and its a free download. Wow. You kids and your new shiny toys... Look, here's a nice little specific example for you. I run an rsync server that contains 8,700+ files and directories. Now, say I want to sync a mere thirty-two new files. Making that request on my server causes the rsync daemon to stat the entire hierarchy to the tune of 18,000+ f lstats. Per request. Freaking ouch. And that's a tolerable use-case in my mind for rsync. That's a hell of alot I/O generated which would take but a couple of stats to retrieve via HTTP or FTP. Assuming you knew what you needed already. Now, when you add in a file set of sufficient size to exhaust filesystem caching, plus a crap load of concurrent requests, my archaic SAR reports written on stone tables tend to say your I/O wait states starts pushing the load levels unacceptably high, not to mention the pages being thrashed from memory's cache pool, high interrupts and excessive seeks on the drives, and so on and so forth. sniff Cavemen are people, too. Now, look at the size of CPAN with *hundreds* of thousands of files. Can you imagine that amount of I/O *per* request?! From a network protocol perspective rsync is quite good. If your network capacity is so large that it exceeds bandwidth or IOPs of your disks you probably can afford better disks or a more efficient disk storage layout. Are mirrors like nic.funet.fi running multiple gigabit WAN connections? If so they could sure demand stream more than a bunch of SATA2 disks can provide. Without performance data its a waste of time to argue against rsync And without having had examined how rsync works on both ends it should have been a waste of time to argue the merits of rsync. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, 26 Mar 2010, Elaine Ashton wrote: Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 years which is the canonical mirror for a large number of mirrors and the perspective of having a few terabytes spinning in storage changes quite dramatically when you are actually serving a few terabytes to thousands of clients. CPAN grew to be quite a burden on the site not only because of the high demand, but also because of the multitude of small files and I'm sure other mirrors feel similarly burdened. Don't be such an arrogant prick. You guys made baseless assumptions about people's experience with storage management in an attempt to diregard their opinions. That's being a dick by any metric. The sort of pruning Tim brought up has long been an idea, but with the current and growing size of the archive, something does need to be done to alleviate the burden not only on the canonical mirrors, but also on the random folks who want to grab a local mirror for themselves. In my present work environment, 12gb isn't a lot of disk space, but it's a lot considering I don't need to install perl modules daily and the vast majority of it I'll likely never use. It would be a kindness to both the mirror operators and to the end-users to trim it down to a manageable size. I think I was quite explicit in saying that efficiencies should be pursued in multiple areas, but the predominant bitch I took away from your thread dealt with the burden of synchronizing mirrors. What's the easiest way to address that pain? I don't believe it's your method. I'd look into the size issue *after* you address the incredible inefficiencies of a simple rsync. As for efficiency, rsync remains a good tool for the job that works on nearly every platform which is a rather tall order to match with any other solution. Relegating the cruft to BackPAN to make the current CPAN slimmer and less demanding on all fronts is an idea that would be welcomed by more than just mirror ops. Rsync is an excellent tool for smaller file sets. I use it to sync my own mirrors, those mirrors are typically ~10k files. Am I surprised that it doesn't scale when you're stat'ing every single file? No. Which is why alternatives should be considered. A simple FTP client playing a transaction log forward is trivial. I maintain several mirrors, most with rsync. But that's with a clear understanding of the size of the file set. Use the right tool for the job. And it seems apparent to me that rsync isn't the right tool for ~200k files. The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged. Try doing a simple cost-benefit analysis. What you guys are proposing will help. But not as much as simpler alternatives. Like replacing rsync with a perl script and modifying PAUSE to log the transactions. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Nicholas Clark wrote: I You? Or someone else? I am quite happy to agree that your understanding and experience of storage management is better than mine. But that's not the key question, in a volunteer organisation. The questions I ask, repeating Jan's comments in another message, are. Oh, I understand that fully. And I'd be happy to lend some of my time. But you don't make people inclined to help when people are lobbing snarky comments like we'll wait breathlessly for you to do it. The impression I'm getting from most of you right now is that you're hell bent on solving the problem your way, and no one is interested in exploring the technical merits of other approaches. Hell, I would even help with work towards your desired method *if* I thought that was the consensus after a genuine exchange and consideration of ideas. I definitely won't should it appear that we have some kind of elitist cabal that will make their decision in isolation. If that's going to be the case then this should have never been raised on an open forum like the module author's list. Quite frankly, at times some discussions on this list fail the concept of a technical meritocracy, and tend towards an established aristocracy. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote: The time-honored tradition of many open source communities is to talk. And talk. And talk. The problem is that this solves nothing. To do, does. You are free to decide to take this as a personal insult. I didn't take it as an insult, I took it as what it was -- a dodge. You already have your minds made up and are not willing to evaluate options on their merits. Let's just be honest about what's going on here. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Elaine Ashton wrote: Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :) :-) You'll have to pardon my indiscriminate epithets. The barbs are coming from multiple directions. My point still stands, however. Your experience, however worthy, has zero bearing on whether or not my experience is just as worthy. Even moreso when you guys have zero clue who you're talking to. And you shouldn't have to know. I would have thought simple communal and professional courtesy would be extended and all points considered in earnest. Which does not appear to be the case. And you're disregarding a considerable problem that rsync is a well-established tool for mirroring that is easy to use and works on a very wide range of platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when they often have several or more, likely won't be met with much enthusiasm and would create two tiers of CPAN mirrors, those using rsync and those not, which would not only complicate something which should remain simple but, again, doesn't address the size of the archive and the multitude of small files that are always a consideration no matter what you're serving them up with. Ah, you're one of them. All objects look like nails when all you have is a hammer, eh? Rsync is a good tool, but like Perl, it isn't the perfect tool for all tasks. You've obviously exceeded what the tool was designed for, it's only logical to look for (or write) another tool. Ironically, what I'm suggesting is so basic that rsync can be replaced by a script which will likely run on every mirror out there with no more fuss than rsync. FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't remember the last time I even used ftp come to think of it. I had to go through 2 layers of network red tape just to get rsync for a particular system I wanted to mirror CPAN to at work. Asking for FTP would have been met with a big no or a cackle, depending on which of the nyetwork masters got the request first. Sounds like you may be hamstrung by your own bureacracy, but that's rarely the case in most the places I've worked. Not to mention that between passive mode FTP or even using an HTTP proxy (most of which support FTP requests) what I'm proposing is relatively painless, simple, and easy to secure. This concern I suspect is a non-issue for most mirror operators. Even if it was, allow them to pull it via HTTP for all I care. Either one is significantly more efficient than rsync. How is replacing rsync, a standard and widely used tool, simpler for mirror ops? I suppose I don't understand the opposition to trimming off the obvious cruft on CPAN to lighten the load when BackPAN exists to archive them. There is already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) so it's not as though lightening the load is a new idea or an unwelcome one. I'm not opposed to trimming the cruft, but I am opposed to ignorant knee-jerk reactions bereft of any empirical data (or at least you haven't shared). The cruft, while being cruft, isn't inherently evil. You have a basic I/O and state problem. And the I/O generated is predominantly caused by rsync trying to (re)assemble state on the file set, *per* request. More appallingly, most of that state image being generated is state that hasn't changed in quite awhile. Literally years in many cases. So why are we wasting cycles I/O performing massively redundant work? That's why having PAUSE implement a transaction log, and perhaps a cron job on the master server doing daily checkpointed file manifests is so much more efficient. An in-sync mirror only needs to download the lastest transaction logs and play them forward (delete certain files, download others, etc). And, gee, just about every author on the list could write *that* sync agent in an evening. Out-of-sync mirrors can start by working off the checkpoint manifest, get what's missing, and rolling forward. What you're overlooking is that CPAN has, and will, continue to grow. Even if you remove the cruft now at some point it might grow to the same size just with fresh files. When that happens, you're right back where you are now. Rsync can't cut it, it wasn't designed for this. Whether you like it or not, even on a pared down CPAN rsync is easily your most inefficient process on the server. If you're not willing to optimize that, then you really don't care about optimization at all. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, 26 Mar 2010, Ask Bj?rn Hansen wrote: I find it curious that everyone who's actually involved in syncing the files or running mirror servers seem to think it generally sounds like a good idea and everyone who doesn't say it's not worth the effort. Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of storage as part of my day job. I think it's a tad presumptuous to disregard input just because we're not in your inner sanctum. As I mentioned in a follow up e-mail: this is simply a matter of selecting the correct problem domain. I believe that streamlining the mirroring process will provide greater gains for less effort. That's not to say that pursuing other efficiencies isn't worthwhile, just that you need to prioritize. But what the hell do I know. I don't run a *CPAN* mirror, so I must be freaking clueless... --Arthur Corliss Live Free or Die
Re: Module uploaded - whats next?
On Mon, 7 Dec 2009, Jonathan Rockway wrote: * On Fri, Dec 04 2009, Bill Ward wrote: Yep, that's why I didn't use Catalyst and would never suggest it to anyone... it's an IT nightmare. Seriously? My Catalyst app works as soon as I type # apt-get install libcatalyst-perl on my Debian Stable system. It isn't quite Java + WAR, but it is also no IT nightmare. Please get the facts straight before spreading FUD. Your argument would be more persuasive if it was true. I have no beef with you Jonathan, but I think you're missing the mark, it is a legitimate criticims. I maintain my own distribution that I use for work and personal use. The reality is that I have to do that packaging work for my distribution. Now, bear in mind that I do use Catalyst, but it's a fact that overly complex applications with a long list of dependencies do place a huge burden on IT infrastructure. It wouldn't be bad if Catalyst was the only major Perl code I had to support, but, gee, I happen to like and use Perl almost freaking everywhere. And when you have that many packages it's a question of *when*, not *if*, an update to a common dependency will break one or more applications. APIs change, certain functionality gets deprecated, or code bases are split into separate module namespaces, etc. I'm not saying that this is a frequent occurrence, but it does, has, and will happen again. Catalyst has by far the longest list of dependencies of any Perl code I support, bar none. It's a very flexible and extendable framework, but it would be naive to think that that doesn't come at a price. And I pay that price regularly. Now, before I hear just use Debian, let me dispell that ignorance. One of the primary reasons why I maintain my own distro/packages is to make sure that revs APIs don't change underneath me just because some distro developer gets a wild hair and wants to live on the bleeding edge. That, and to avoid them from doing stupid things like, say, lobotimzing the RNG in openssl. I have better things to do with my time than wonder if apt-get is going to pull in a new magical combination of revisions that's going to break some code I need to just work. --Arthur Corliss Live Free or Die
Re: Module uploaded - whats next?
On Fri, 4 Dec 2009, Dave Rolsky wrote: The idea that you couldn't learn the basics of Catalyst and get things running in the same time seems unlikely. Also, you haven't factored in all the time it's going to take you to add features and fix bugs already present/fixed in an existing tool. I have no interest in any possible outcome of this discussion, but I have to throw out a small tangential comment: as someone who has to support a Catalyst deployment I'd like to know if anyone here has unrolled the ridiculously long list of dependencies necessary for Catalyst?! As a fellow dev I struggle often with trying to achieve the balance between not reinventing the wheel and not including the wheel that has its own tractor trailer attached towing a mobile factory. And as a fellow admin as well I may have to err towards a framework that provides the core functionality I need with the least number of moving parts. Let the guy introduce another framework. None of the existing frameworks are void of any sizeable Cons. --Arthur Corliss Live Free or Die
Re: Weekend entertainment
On Mon, 16 Nov 2009, Jonas Br?ms? Nielsen wrote: Aloha, Most cultures/countries have a redneck equivalent, so it is not totally wasted on us non-us... anyway. I say if we have offended anybody, we remove the offending distribution. It is in the acme namespace, so it will not create a lot of fuss removing it. The author has the choice and freedom of distributing via other channels, but we as a community might have to be more democratic about it, so I say it goes. As for the apology, I think we could just get by with a friendly and polite email, informing the offended person of our actions. Going into a deeper analysis of the nature of humor and cultural differences is a waste of time, CPAN is an international resource and therefore we should act accordingly. The Acme namespace is still open and welcoming to all the crazy experiments, but we have to play nice at the same time. jonasbn Personally, I believe that anyone who's wasted more than 30 seconds on this entire discussion should get a life. It's more than obvious what the author was satirizing and no one has an inalienable right to not be offended. If you don't like it, fine, blog about it, complain to your friends. But let the rest of us move on with matters of some actual import. Quit trying to be the politically correct thought police of the world. --Arthur Corliss Live Free or Die
Re: Weekend entertainment
On Mon, 16 Nov 2009, Jonas Br?ms? Nielsen wrote: Hi Arthur, We have a saying in Danish, go for the ball, not the man I don't disagree with your saying, but this thread needs to go away. It seems to be much ado about nothing by the very thin skinned. Somebody invoke the usenet Hitler rule quick! I've now wasted two e-mails on this subject. I may develop a rash. --Arthur Corliss Live Free or Die
Re: Help needed testing security of login module
On Wed, 20 May 2009, Bill Ward wrote: 2. Make sure to have a salt value, as it prevents the use of rainbow tables to get a password. So you have the hash and a known salt kept separately (the salt is plaintext), and when you check the password you check: sha256(passphrase + salt) == sha256(passphrase_entered + salt) I'm not doing that, but that wouldn't be hard to add. I didn't think that a salt was necessary with a one-way hash. Salts are a way of combating the use of rainbow tables, which is a database of precomputed values within certain bounds. Makes brute force attacks virtually painless, becuase now it's just a lookup. But don't add a static salt, that's almost as pointless as not using one at all. If you're going to use salts make sure you generate a new one every time, preferrably pulling a few bytes from /dev/u?random or similar. If you're really paranoid you'll also do key strengthening, similar to what most system authentication does. Hash with a salt, then hash the result with the salt, repeat a few thousand times. --Arthur Corliss Live Free or Die
Re: Help needed testing security of login module
On Wed, 20 May 2009, Jonathan Yu wrote: There are web sites that specialize in that sort of thing. So having a 2-byte salt can really help stop those attacks, or at least make the amount of space needed infeasible (since every different 2 character salt will require you to generate an entirely different rainbow table). 16 bits of salt is roughly less than 64TB for a rainbow table that includes all salt values. That's doable in this day age, I'd go at least four bytes, if not more. Adding a larger salt incurs virtually no penalty for legitimate users, but makes it uneconomical for the attackers. --Arthur Corliss Live Free or Die
Re: Help needed testing security of login module
On Wed, 20 May 2009, Jonathan Yu wrote: Not totally pointless, of course, because it would still require regenerating a rainbow table versus downloading one of them already available. On the other hand, depending how popular your application gets, this can be dangerous -- take for example Microsoft's Lan Manager Hash algorithm, LMHash. Even though it is a specialized algorithm, it became popular enough to make it feasible/useful to create and distribute rainbow tables for. So your point is valid in that case, and it never hurts security nor is it a big deal on performance. I would suggest that the benefit of a static salt is marginal in best since many of these hash algorithms aren't exactly computationally intensive on today's hardware. If you have a guy trying to crack passwords from a shadow file he's only got to generate one table for all of them, versus a table per account. It's an order of magnitudes more difficult in that regard, especially if you expand the scope to all users of an application every where. And /dev/random can be slow, so urandom is a better suggestion, or even better, using /dev/random to seed a random number generator algorithm like the Mersenne Twister (which is essentially what /dev/urandom does) Which was why included urandom as a suggestion. --Arthur Corliss Live Free or Die
Re: Help needed testing security of login module
On Wed, 20 May 2009, Jonathan Yu wrote: That's a pretty valid point. If it's a simple auth system as I understand it, though, then the users don't have different permissions, so there's really no point in cracking *all* of the passwords if you can download all the data with one. No arguments on that. :-) Virtual users versus real system users will always be a weak link that can be attacked on web apps. And Bill's right in that if someone already has your hashes they probably already have access to the rest of it as well. This exercise is merely about protecting what was used to generate the hashes. Everything else is a separate issue. --Arthur Corliss Live Free or Die
Re: a lot of controversy about Module::Build
On Thu, 9 Apr 2009, Eric Wilhelm wrote: I, as a module author providing you a free product, don't have to give a damn. Realistically, authors give some amount of damn, but maybe not a full I'll support Perl 5.004 for the poor slobs using ancient Red Hat boxes. Exactly. If you treat Perl like a legacy language, there won't be any new users and you won't have any problems of some new code not being compatible with your old code because there won't be any new code. But if you treat Perl like there's only new installations out there you're going to be ignoring a huge installed base of older machines, and your code won't get used. You guys have the right to do whatever you want with your code, and I'm not advocating that everyone should support fifteen years of Perl revisions. I am, however, saying that if you really want the Perl community to largely benefit from contributions you need to be conscious of what the installed base out there is using. I highly doubt the majority of Perl *users* (not developers) out there are as bleeding edge as yourselves. --Arthur Corliss Live Free or Die
Re: a lot of controversy about Module::Build
On Thu, 9 Apr 2009, David Cantrell wrote: That's a hugely optimistic and naive statement, even if it's true most of the time in the Perl community. But lots of people who use modules from the CPAN aren't really in the perl community, and that's important. Actually, there are lots of people *in* the community who don't keep their toolchain up to date and have no idea why it might be a good idea to upgrade from the CPAN.pm that they installed a few years ago. I think you misread my statement. I was saying that assuming everything just works better because it's a newer rev is, again, optimistic and naive. For that reason many of us choose to lag simply to let you blokes that like cutting yourselves on the bleeding edge sort out the conflicts for us. :-) For that reason I won't adopt Eric's philosophy of wanton upgrading. But, anyway, is it a problem we really need to be inflicting on new Perl users? Do they have to care if somebody might be running 5.8.8 somewhere? With 5.10.0 out for well over a year now? Hell, yes, *I* care. Developers should be aware of portability if they expect the code to run anywhere outside of the machines they control. Yes! I care because not all my machines have been upgraded to 5.10. I care because not all the machines at work have been upgraded. I care because if I deliberately restrict my code to 5.10 only, then I restrict the number of people who will be inclined to do my work for me and send patches to fix my bugs. A common plaint I hear about perl code *from people outside the community* is that we have too many dependencies and our code is too hard to install. And I can sympathise. If you don't know how to configure CPAN.pm to automagically follow dependencies (incidentally, why is the default prerequisites_policy still 'ask' and not 'follow'?) then it's a gigantic pain in the arse. If on top of that you want them to *upgrade perl* they're going to think you're mad. And we should care about people outside the community, because they vastly outnumber those of us *in* the community. They and their opinions are important because they do things like influence which technologies their employers use, and consequently how many jobs there are for us. Amen. I bow to your more eloquent explanation. --Arthur Corliss Live Free or Die
Re: a lot of controversy about Module::Build
I hope you guys don't mind if I interject... On Wed, 8 Apr 2009, Eric Wilhelm wrote: That depends on who one is. ?If you're writing specifically for people who keep their toolchain and perl religiously up-to-date, There's nothing religious about it. You upgrade, it works better. That's a hugely optimistic and naive statement, even if it's true most of the time in the Perl community. Regressions happen. But, anyway, is it a problem we really need to be inflicting on new Perl users? Do they have to care if somebody might be running 5.8.8 somewhere? With 5.10.0 out for well over a year now? Hell, yes, *I* care. Developers should be aware of portability if they expect the code to run anywhere outside of the machines they control. The reality is that there are a lot of installations that lag current perl releases by years, either because some OS versions are in maintenance-mode only, or because many commercial Unices are always slow to upgrade. As I said before, regressions happen, and the bleeding edge is called bleeding for a reason. For those reason I still test my code back to Perl 5.6.x. And anyway, if the trouble with using something is that it's not core, the fix is not to get it into the core. Rather, we should try to make coreness not matter. You're right, but you're massively oversimplifying the problem. Practical reality has to have influence at some point. I still use EU::MM myself because I know that it will work pretty much everywhere. Not everyone is willing (and rightfully so) to install twenty other modules just to install and use the functionality of one. --Arthur Corliss Live Free or Die
Re: Perl bug, or author bug?
On Thu, 5 Mar 2009, Arthur Corliss wrote: snip Turns out this is a unsigned int to signed int casting problem, not a 64-bit unclean problem. Legitimate bug in Perl, either way, and a patch should be submitted to the devs shortly. --Arthur Corliss Live Free or Die
Perl bug, or author bug?
Greetings: I have to ask everyone here to ascertain if I'm going daft. Here's the situation: 32-bit Perl on 64-bit Perl platform. getpwent(2) returns 64-bit UIDs just fine. getgrent(2), however, truncates them to a 32-bit value. GIDs returned from functions like stat(2) work just fine, however. I'm seeing this on AIX 5.3: # perl -e 'do { @pw = getpwent } until $pw[0] eq nobody; print $pw[2], \n;' 4294967294 # perl -e 'do { @pw = getpwent } until $pw[0] eq nobody; print $pw[3], \n;' 4294967294 # perl -e 'do { @pw = getgrent } until $pw[0] eq nobody; print $pw[2], \n;' -2 # grep nobody /etc/passwd /etc/group /etc/passwd:nobody:!:4294967294:4294967294::/: /etc/group:nobody:!:4294967294:nobody,lpd Legitimate bug in Perl? This one is version 5.8.2, BTW. --Arthur Corliss Live Free or Die
Re: Another non-free license - PerlBuildSystem
On Fri, 16 Feb 2007, Ashley Pond V wrote: If there are any law/license experts in the crowd, I'd love to see a formal/named/solid version of this sort of license. It's just about exactly what I've always wanted to put on all my own code. What kind of idiocy is this?! There's a *lot* of people I hate or disagree with, but I believe in *true* freedom when I release my code. I'm not going to tell someone they can't use it just because of their occupation, race, religion, or sexual orientation. I find it particularly ironic that people are targeting some of the very people that allow all of us to express stupid opinions safely. People that petty have far too much time on their hands. Just code, damn it. --Arthur Corliss Live Free or Die
Re: running tests
On Fri, 2 Apr 2004, Tim Harsch wrote: Hi all, If I have several test files in my test suite, is there a way to get them to run in a predefined order when the user runs make test? I realize I could name them alphabetically like Atest1.t, Bsometest.t, but it seems hokey and I'm not sure it would work on all systems. I think a lot of us just use numeric prefixes to control the order: 01_ini.t 02_scalar.t 03_list.t ... etc. --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: trouble with MakeMaker finding library
On Wed, 31 Mar 2004, Tim Harsch wrote: Hi all, My module requires a shared library. I'd like for the user to have that library available at LD_LIBRARY_PATH and call it good enough. However, look at the following. I get 'No library found' and note: both LD_LIBRARY_PATH is set and -L specifies the location *** use ExtUtils::MakeMaker; die SGE_ROOT environment variable not defined unless my $SGE_ROOT = $ENV{SGE_ROOT}; # See lib/ExtUtils/MakeMaker.pm for details of how to influence # the contents of the Makefile that is written. WriteMakefile( 'NAME' = 'Schedule::DRMAAc', 'VERSION_FROM' = 'DRMAAc.pm', # finds $VERSION 'PREREQ_PM' = {}, # e.g., Module::Name = 1.1 ($] = 5.005 ?## Add these new keywords supported since 5.005 (ABSTRACT_FROM = 'DRMAAc.pm', # retrieve abstract from module AUTHOR = 'Tim Harsch [EMAIL PROTECTED]') : ()), 'LIBS' = ['-ldrmaa', '-lsocket', '-lnsl', '-lm', '-lpthread'], 'DEFINE' = '', # e.g., '-DHAVE_SOMETHING' # Insert -I. if you add *.h files later: 'INC' = -L$SGE_ROOT/lib/sol-sparc -I$SGE_ROOT/include, # Un-comment this if you add C files to link with later: 'OBJECT' = '$(O_FILES)', # link all the C files too ); *** [868] dna:/home/harsch/CVS/managers/drmaa echo $LD_LIBRARY_PATH /home/harsch/CVS/managers/drmaa/Schedule:/home/harsch/tmp/SGE_040310/lib/sol -sparc:/usr/local/lib:/usr/local/opt/SUNWspro/lib:/opt/SUNWspro/lib:/usr/ope nwin/lib:/usr/dt/lib:/usr/4lib [869] dna:/home/harsch/CVS/managers/drmaa echo $SGE_ROOT/ /home/harsch/tmp/SGE_040310/ [870] dna:/home/harsch/CVS/managers/drmaa perl Makefile.PL Note (probably harmless): No library found for -ldrmaa Writing Makefile for Schedule::DRMAAc BTW, keep in mind that your usage of LIBS will use each array member as a complete set of arguments to ld, which means that if you had the '-ldrmaa' as the last array member you would have never even have gotten that warning. If all those libraries are mandatory you need to pass them as a single array member. Quoting the pod: LIBS An anonymous array of alternative library specifications to be searched for (in order) until at least one library is found. In any event, setting LD_LIBRARY_PATH won't override the search path when testing for libraries, you'll need to add it to LIBS as well: LIBS = [-L$SGE_ROOT/lib/sol-sparc -ldrmaa, ...] --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: Namespace suggestions for new module submission (record-level transaction howto?)
On Sat, 3 Jan 2004, david nicol wrote: I am not certain how big Sleepycat's release is any more, but I think a DB::Inline done with Inline.pm wrapping sleepycat code would be an interesting project. That might just move the problem from library synchronization to making sure that everyone has access to a compiler though. Agreed, that's another concern that I have, which is one of my lesser reasons for doing a Pure Perl solution. I would like to hear more about the record level locking and transactions. Perltie does not support these features: what will the interface look like? My own efforts to do record level locking with DirDB from above the perltie are done by: snip Keep in mind that I've got a great deal of performance tuning to do, along with some serious regression tests to make sure everything works the way that I'm intending. This isn't stable code yet. In a nutshell: I'm cheating the system through the use of the transaction log (something that will be in use for *every* write to the db), and I still want to preserve concurrent writes. Outside the nutshell: This system is nothing more than an AVL binary tree implementation, using four files (index, key values, associative values, and the transaction log). The write process goes something like this: check the transaction log for any open transactions for the same record, write lock the log and add the entry (concurrently executing transactions can ignore the advisory lock to mark their transaction complete). Update the application blocks in the relevant files, write-locking only if the file are to be extended (and as before, other writes that aren't extending the files can ignore the lock). As for transactions: what I've described above is all I have at the moment. Atomic record updates. What I'd like to do at some point is add support in the log format definition for multiple record updates, but that isn't done yet. Another FYI, before someone asks: I chose four files for storage for a reason (all reasons are influenced by my feeble-mindedness, of course). First, I wanted to be able to crawl/rebalance the binary tree with fixed length records for performance reasons. Second, having separate files for the actual values of the keys and associative values allows me to have full binary storage capability without worrying about special encoding tricks, etc. Outside of my method of tracking available slots of storage (i.e., deleted records) for reuse, there's nothing but data in those two files, not even record demarcation. The transaction log, of course, speaks for itself. Now, if someone knows a better way, I'm all ears. :-) If you're making up your own file format, how about CorlissDB? The only problem I have with that is I don't want to give the impression that this is just another wrapper for yet another C implementation. Many people will assume that they'll need some libraries and end up ignoring it. You said support tied hashes -- Did you mean support for storing hash references? Nope. It will support the hash binding via the tie() function. That's the primary method of use I have for it right now. I added support for hash references to my DirDB (and DirDB::FTP) modules and would appreciate your feedback on the semantics of the interface. They are as follows: When you store a reference to a native perl hash to DirDB, the hash becomes blessed so that further manipulation of the referenced hash manipulates the persistent store as well. When you store a reference to a tied hash to DirDB, you get a deep copy. When you store anything other than a scalar or an unblessed hash reference, the module throws a croak without overwriting or corrupting These semantics make it possible to do multi-level autovivification inside a DirDB data structure, even over the network (by FTP.) Sounds interesting. I haven't used that module before, but I think I'll go download it and check it out. I can imagine a few uses for it. As to the semantics, I can't speak intelligently on that until I get a fuller feel of how the module will typically be used. --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: Namespace suggestions for new module submission
On Fri, 2 Jan 2004, Mark Stosberg wrote: Will this be implemented with the DBI interface? Then DBD::YourProject seems appropriate. DBD::SQLite seems to be a related case, although it's not Pure Perl, it just allows you install it as a standard DBI driver. I don't think it does enough to warrant inclusion in DBD::*, nor have I planned to make it accessible via DBI. It's just another method for disk-based stateful hashes, like all the *DBM_File modules. Modules like AnyDBM_File and DB_File are causing some unpredictable results in some of my code, depending on the version and implementation of the dbm libs they're linked against. This is just my way of getting predictable results without requiring admins to upgrade or install new system libs, along with the requisite Perl modules. --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Namespace suggestions for new module submission
Greetings: In the near future I'd like to submit a module for inclusion on CPAN. I need some advice on the appropriate namespace, however, since I don't want to pollute top-level namespace. Unofficial module name (as it's being developed): PerlDBM Synopsis: Pure-perl implementation of a dbm engine. Supported only on platforms with 64-bit filesystems. Database files are portable (all data is stored in network-byte order), with record-level locking and transactions. Has it's own API for low-level control, but also will support tied hashes. I did notice that most of the XS wrappers for C-based implementations were all in top-level namespace, though. Any suggestions/preferences? --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: Submitting a new module? (Linux::ForkControl)
On Thu, 13 Nov 2003, Brad Lhotsky wrote: snip So I guess, two questions: 1) Anyone see this as useful? 2) Is 'Linux::ForkControl' a decent name for this module? 1) Yes. 2) I almost thing that a reverse would be better (i.e., ForkControl::Linux, or similar). Your module could provide a generic interface (along with a working Linux implementation), and others could contribute other platform implementations. I would be interested in implementations on AIX, IRIX, and Solaris, personally. If I can catch a few projects up to date, I'd contribute the modules myself. As an addendum, I think it would be useful to be able to differentiate between CPU load and memory load, placing limits on both. --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: RFC: SQL::ExportDB
On Wed, 24 Sep 2003, Michael A Nachbaur wrote: It sucks being in the latest timezone in North America; it makes it that much more difficult to catch up on all the mailing list traffic. Latest timezone, where are you? I'm in Alaska, I thought I was in the latest timezone in NA. ;-) --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto
Re: RFC: SQL::ExportDB
On Wed, 24 Sep 2003, Michael A Nachbaur wrote: Yes, but Alaska doesn't actually count, you see. They always draw it on maps sort of floating next to California, right above Hawaii which, curiously enough, is rougly the same size as Alaska. Odd, that. ;-) LOL. You've got me there. As soon as we find a stretch of coastline that matches our eastern border we'll dock and revisit this discussion. I hear the fault lines in CA would be a good match if you'd hurry up and drop into the ocean. ;-) --Arthur Corliss Bolverk's Lair -- http://arthur.corlissfamily.org/ Digital Mages -- http://www.digitalmages.com/ Live Free or Die, the Only Way to Live -- NH State Motto