Re: [abcusers] abc repository similiar to olga.net?
| 2. Make sure you aren't replicating something that's already been |replicated, perhaps with mistakes or computer garblement en route. |We don't need 105 identical versions of The Irish Washerwoman |hiding one original take on the tune. (An easy way for file |providers to do this is to add an S: line giving the original |URL if the tune is a literal copy of one from another site). Good idea. But figuring out how to do this right isn't easy. It's all too easy for a chunk of software to decide on the worst one. Having a human do it for 100,000 tunes would be a bit of an undertaking. I was mainly concerned with the case where they're literally identical - the many occasions where a tune has been copied unchanged from one site to another. That's a well and truly solved problem (it's what DNA- matching software does). If one is worse than another they can't be the same; I wasn't proposing any sort of control on *musical* quality, or even syntactical correctness: simply trying to make things easier for the user who has many copies to choose among. | 3. Provide a human contact for every file (you'll have this anyway |if you've asked permission) - lots of ABC files raise questions, |and the TuneFinder interface provides no way of getting answers |to them, as what you get doesn't even have a URL included. My tune finder in fact does insert the URL and date if you ask for a tune in TXT or ABC form. It uses the F: header line. But not when you download an entire file (which is what I always do). Doing this turned out to be tricker than one might expect. The problem was the variety of line terminators. Just inserting the F: line with an ANSI standard line terminator doesn't work, because a lot of software can't handle files with mixed styles of line terminators. I eventually found by experiment and a bit of email with people who had problems that the solution was to strip out the terminators and make them all the same. It doesn't matter whether you use \n or \r\n as long as they're all the same. The reason for this, at least with the Mac stuff I know about, is that most conversion utilities look at the first line in the file and try to guess what convention it's using from that. If the first line is different from all the others this is maximally confusing for the poor wee thing. What do you do with a file in EBCDIC? - that makes these variations look rather trivial. Somebody must have ported abc2ps to IBM MVS or ICL VME, surely? Somebody sometime ought to figure out what has usually gone wrong with all those sites where (at least as viewed from a Mac) all the ABC is double-spaced. I suspect somebody simply used the wrong flag on an email or ftp client that does conversion on the fly, and that the problem is quite easy to avoid no matter what OS and software you've got. This URL doesn't directly give you an email address. [...] If a site's owner wants to remain incommunicado, it's fairly easy to do. The Tune Finder can't do much about this, but a planned mirroring site can. There is no obligation to mirror stuff from people who want to make life difficult for their readers and who believe they're such important celebrities that nobody should be informed who they really are. In any case, if that's the way somebody thinks, they can make their lawyer's office the contact. (An email address might not be the right sort of contact, and whatever is provided - ICQ number, mobile phone number, EBay seller id - its validity needs to be checked every so often). I wonder if we could do something like EBay ratings for tune providers? A feedback message board, even? (Henrik's Irish Washerwoman really does wash whiter...) The one concern I would have about having my own stuff mirrored is that I'd want the mirror to encourage people to look at my own site too; to a certain extent the ABC files I have available are advertising and I'd like them to function that way. Other people might want the opposite - Demon's server can handle all the hits anyone's likely to throw my way and I've set things up for maximum simplicity, but somebody whose primary server has a wet-piece-of-string connection or a mega-inconvenient user interface might to want to offload the work onto Toby's machines. === http://www.purr.demon.co.uk/jack/ === To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. This could be a good idea, as your site is considerably more reliable than MIT. But a few things need to be done to ensure data quality: 1. Make sure the copy is up-to-date. Most ABC files on the web don't change but some of the most interesting ones do. 2. Make sure you aren't replicating something that's already been replicated, perhaps with mistakes or computer garblement en route. We don't need 105 identical versions of The Irish Washerwoman hiding one original take on the tune. (An easy way for file providers to do this is to add an S: line giving the original URL if the tune is a literal copy of one from another site). 3. Provide a human contact for every file (you'll have this anyway if you've asked permission) - lots of ABC files raise questions, and the TuneFinder interface provides no way of getting answers to them, as what you get doesn't even have a URL included. Also, as you're going to be writing a hell of a lot of I am not responsible for that content messages to people who fire queries at you, it would seem to be simple self-preservation to be able to name somebody who *is* responsible for it. None of this should be difficult to arrange for files that are being actively maintained by a live human. === http://www.purr.demon.co.uk/jack/ === To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. This could be a good idea, as your site is considerably more reliable than MIT. But a few things need to be done to ensure data quality: Wow, thanks Jack.. That's a huge compliment, considering my entire professional career is based on trying to make computers reliable.. That means alot to me. Even though those machines are my personal ones, I try to apply to same level of care to them as the machines that I earn my pay with. To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. /discussion But I've still not heard anything that makes me think that this sort of centralised abc database has any advantage or real purpose. JC's tune finder is a (wonderful wonderful wonderful) tool that gives quick search and retrieval access to any abc tune that it knows about, whether it be in Richard's tune book or Henrik's files or my files or wherever. So long as JC's tune finder knows about the file, there is immediate access. If (naming two of the major collection maintainers as prime examples) Richard or Henrik suddenly threw a major hissy fit and completely removed their tune resources from the web, or suffered a catastrophic life or data event that wiped out the collections, we'd all be the poorer, and in that situation we'd be grateful that someone had taken a mirror before said event - but that isn't what we're talking about - is it? -- Steve Mansfield [EMAIL PROTECTED] http://www.lesession.co.uk - abc music notation tutorial, the uk.music.folk newsgroup FAQ, and other goodies To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Steve == Steve Mansfield [EMAIL PROTECTED] writes: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. Steve /discussion Steve But I've still not heard anything that makes me think that Steve this sort of centralised abc database has any advantage or Steve real purpose. I think if all it does is mirror (or worse copy) stuff that's already on the net, it doesn't. What I think might be of more value is something that combines a mirror, or even just indexing and pointer like John's tune finder, with something that allows submission of tunes that aren't on the net yet. There must be lots of people writing ABC who don't have a website but would like to share their work. http://www.cpdl.org would be a good model for this. When Rafael Ornes first approached me about having my stuff included, he was thinking in terms of keeping copies on his site, and I told him I wasn't comfortable with that, for the reasons other posters in this thread have cited -- sometimes I make corrections or improvements, and I don't want the uncorrected version lying around after that. I imagine other people made the same objections, and now, the central feature of the site is the database and searching facility, and providing a link to be put in the database is the preferred way of contributing, but I believe that you can also send your work to Rafael and have it both in the database and on the cpdl website. -- Laura (mailto:[EMAIL PROTECTED] , http://www.laymusic.org/ ) (617) 661-8097 fax: (801) 365-6574 233 Broadway, Cambridge, MA 02139 To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Have you any experience with counter-attacking methods called 'tarpitting' or 'quicksanding'... I don't recall which. I read a blurb about it a while ago Specifically, intentionally timing out requests from snakes/spiders/etc to bog their machine down to the point that they sit up and take notice and possibly act more responsibly? Increasing their costs for such actions to a point so high as to make them non-feasible in future. //Christian Btw.. I appreciate the below. There's a lot of info and jumping off info that I will be incorporating in future. John Chambers wrote: Christian M. Cepel asks: | Does John's script obey robot exclusions? I'm ready to kill Altavista | for spidering my _javascript_ validated forms, submitting them empty, and | completely ignoring robot exclusions. Yes, it does. The first thing it does for each site is asks for the robots.txt file, and stays away from directories that have a general exclusion. The only exceptions are when someone specifically asks to have their music scanned, and then their directory becomes an "exception to the exclusion". I think this has only happened once. I also have a significant tune collection (partly from extracting tunes from lists like this one). I was given write access to the robots.txt file on the machine a couple of years ago, and it excludes most of my music stuff. I've found that the big search sites just aren't very good for finding music. And then I have to list my own directories as exceptions to the robots.txt rules, as mentioned in the previous paragraph. OTOH, if I had a collection of abc songs with lyrics, I'd probably want that searched by the big guys. They're all pretty good at finding lyrics. I know what you mean about the forms. And there's a similar problem with cgi scripts. Maybe two years ago, I started reading about research into searching for "hidden pages" on the web that can only be found via forms and scripts. My reaction to this was "Uh-oh; I'd better watch for this. About a year ago they hit. Several search sites started invoking my lookup script systematically with random-looking arguments, and whem they got a reply with a form, started exploring the links. They were, in effect, attempting to get every abc tune on the web in every format that my scripts know how to return. One of them hit our server simultaneously from about 30 different addresses, and had over 100 tune convertions outstanding. It brought the server to a screeching halt. I got enough cpu time to add a "blacklist" to my scripts, and whenever I see symptoms of this, I add their address (or subnet) to the blacklist. And I added a small (5 sec) minimum between requests from the same address. Both of these can be a hassle to people working from behind a firewall, since what my scripts see is the firewall's address, and all users behind it look like a single user. But such things are necessary when there are misbehaving search monsters out there. One of the side effects of this is that I no longer tell the mailer here to forward my email to my home machine. I log in and read the email here. This means that I'm logged in several times during most days. This is so that I can keep a constant watch for attacks on the web server. Most of these are probably not malicious; they are more likely from novice searchers. But it's a good idea to spot them fast and install defenses against the new ones. My search program also has a sort of "reverse blacklist". In its list of starting URLs, I can include URLs or hosts that are to be avoided. I've mentioned this on lists that I subscribe to, with the idea that someone might not want their tunes indexed. So far I haven't actually had anyone say they want to be avoided, but it's a possibility. I mostly use this as a way to keep the search program away from some sites that are known sinkholes of time with no abc tunes. There are some sites that have pages with millions of links, and such things are best ignored. Another thing I have my searcher do is ignore any URL with "cgi" as a token, i.e., with non-letters on both sides. This is fairly effective at preventing the invocation of scripts without arguments, and that's almost always a pure waste of time. I've also been thinking of also excluding things like "php", but so far that hasn't been necessary. You can learn a lot of weird stuff when you try writing a web search program ... To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html -- Christian Marcus Cepel ("`-''-/").___..--''"`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information
[abcusers] abc repository similiar to olga.net?
Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. Toby -- Toby Rider ([EMAIL PROTECTED]) - Some of those parts were totally rubbish, because when you think you're playing well and you're drunk, you're actually playing like an idiot. - Robert Smith Toby Rider's Understated Homepage: http://www.blackmill.net/toby_rider/ To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
I know there are many out there. I'm fond of http://www.thesession.org/ Toby Rider wrote: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. Toby -- Christian Marcus Cepel ("`-''-/").___..--''"`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins
Re: [abcusers] abc repository similiar to olga.net?
Toby write: | Has anyone thought of compiling a centralized database of abc tunes | similar to olga.net.. I find that resource incredibly useful. | Basically something like John's tune finder, except that it saves | everything to a local database. | I would be willing to donate computing power storage space to such a | project. Well, I've consider it. ;-) Storage is one reason for not doing it. Another is the question of whether (or which) sites' owners would agree to being mirrored this way. I'd guess that a lot would, but a few would object to the idea. I wouldn't want to collect everyone else's tunes this way without their permission. Of course, google does this sort of thing, and the google cache is often very useful. But I'd think we'd want a bit of a public discussion before caching other people's tunes like this. To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Ah, but www.thesession.org requires people to submit their tunes to it. something that combined John's indexing approach, along with a comprehensive database for the abc's of the tunes, would be incredibly sweet.. I know there are many out there. I'm fond of http://www.thesession.org/ Toby Rider wrote: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. Toby -- Christian Marcus Cepel (`-''-/).___..--''`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins -- Toby Rider ([EMAIL PROTECTED]) - Some of those parts were totally rubbish, because when you think you're playing well when you're drunk, you're actually playing like an idiot. - Robert Smith Toby Rider's Understated Homepage: http://www.blackmill.net/toby_rider/ To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
I'm confused. How is that different from Olga? Olga is made up entirely of submitted ascii files, and ones that were pasted in the newsgroup for examination/distribution. I understand that you're looking for something different entirely... but it does seem that thesession.org matches olga in this respect at least. Or am I missing something? Toby Rider wrote: Ah, but www.thesession.org requires people to submit their tunes to it. something that combined John's indexing approach, along with a comprehensive database for the abc's of the tunes, would be incredibly sweet.. I know there are many out there. I'm fond of http://www.thesession.org/ Toby Rider wrote: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. Toby -- Christian Marcus Cepel ("`-''-/").___..--''"`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins -- Christian Marcus Cepel ("`-''-/").___..--''"`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins
Re: [abcusers] abc repository similiar to olga.net?
There is Richard Moon's TuneDB http://tunedb.woodenflute.com/ which has several thousand tunes in it. It allows searching by name or abc fragment. Very cool. on 3/3/03 3:33 PM, Toby Rider wrote: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Does John's script obey robot exclusions? I'm ready to kill Altavista for spidering my javascript validated forms, submitting them empty, and completely ignoring robot exclusions. I see the difference now.. Thanks for explaining. //Christian Toby Rider wrote: Yes, thesession.org does exactly what Olga does.. However combining the indexing approach of the abc tune finder, along with a centralized database like Olga, or thesession.org, would be even better.. The only issue is permission.. Someone would have to contact every site with abc tunes that we would possibly want to query for tunes and get permission. John is running a copy of the tune finder on one of my machines and I periodically get emails asking why one of my IP addresses is spidering their site.. I tell them what it's up to, and thy are usually cool about it. Toby I'm confused. How is that different from Olga? Olga is made up entirely of submitted ascii files, and ones that were pasted in the newsgroup for examination/distribution. I understand that you're looking for something different entirely... but it does seem that thesession.org matches olga in this respect at least. Or am I missing something? Toby Rider wrote: Ah, but www.thesession.org requires people to submit their tunes to it. something that combined John's indexing approach, along with a comprehensive database for the abc's of the tunes, would be incredibly sweet.. I know there are many out there. I'm fond of http://www.thesession.org/ Toby Rider wrote: Has anyone thought of compiling a centralized database of abc tunes similar to olga.net.. I find that resource incredibly useful. Basically something like John's tune finder, except that it saves everything to a local database. I would be willing to donate computing power storage space to such a project. Toby -- Christian Marcus Cepel (`-''-/).___..--''`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins -- Christian Marcus Cepel (`-''-/).___..--''`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins -- Christian Marcus Cepel (`-''-/).___..--''`-._ [EMAIL PROTECTED] icq:12384980 `6_ 6 ) `-. ( ).`-.__.`) 371 CrownPoint Columbia 65203-2202 (_Y_.)' ._ ) `._ `. ``-..-' w573.882.8309 h443.8676 m268.7533 _..`--'_..-_/ /--'_.' ,' Computer Support Specialist, Sr. (il),-'' (li),' ((!.-' School of Information Science Learning Technologies, College of Ed, University of Missouri - Columbia * And the wrens have returned are nesting *In the hollow of that oak where his heart once had been *And he lifts his arms in a blessing *For being born again. --Rich Mullins To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Christian M. Cepel asks: | Does John's script obey robot exclusions? I'm ready to kill Altavista | for spidering my javascript validated forms, submitting them empty, and | completely ignoring robot exclusions. Yes, it does. The first thing it does for each site is asks for the robots.txt file, and stays away from directories that have a general exclusion. The only exceptions are when someone specifically asks to have their music scanned, and then their directory becomes an exception to the exclusion. I think this has only happened once. I also have a significant tune collection (partly from extracting tunes from lists like this one). I was given write access to the robots.txt file on the machine a couple of years ago, and it excludes most of my music stuff. I've found that the big search sites just aren't very good for finding music. And then I have to list my own directories as exceptions to the robots.txt rules, as mentioned in the previous paragraph. OTOH, if I had a collection of abc songs with lyrics, I'd probably want that searched by the big guys. They're all pretty good at finding lyrics. I know what you mean about the forms. And there's a similar problem with cgi scripts. Maybe two years ago, I started reading about research into searching for hidden pages on the web that can only be found via forms and scripts. My reaction to this was Uh-oh; I'd better watch for this. About a year ago they hit. Several search sites started invoking my lookup script systematically with random-looking arguments, and whem they got a reply with a form, started exploring the links. They were, in effect, attempting to get every abc tune on the web in every format that my scripts know how to return. One of them hit our server simultaneously from about 30 different addresses, and had over 100 tune convertions outstanding. It brought the server to a screeching halt. I got enough cpu time to add a blacklist to my scripts, and whenever I see symptoms of this, I add their address (or subnet) to the blacklist. And I added a small (5 sec) minimum between requests from the same address. Both of these can be a hassle to people working from behind a firewall, since what my scripts see is the firewall's address, and all users behind it look like a single user. But such things are necessary when there are misbehaving search monsters out there. One of the side effects of this is that I no longer tell the mailer here to forward my email to my home machine. I log in and read the email here. This means that I'm logged in several times during most days. This is so that I can keep a constant watch for attacks on the web server. Most of these are probably not malicious; they are more likely from novice searchers. But it's a good idea to spot them fast and install defenses against the new ones. My search program also has a sort of reverse blacklist. In its list of starting URLs, I can include URLs or hosts that are to be avoided. I've mentioned this on lists that I subscribe to, with the idea that someone might not want their tunes indexed. So far I haven't actually had anyone say they want to be avoided, but it's a possibility. I mostly use this as a way to keep the search program away from some sites that are known sinkholes of time with no abc tunes. There are some sites that have pages with millions of links, and such things are best ignored. Another thing I have my searcher do is ignore any URL with cgi as a token, i.e., with non-letters on both sides. This is fairly effective at preventing the invocation of scripts without arguments, and that's almost always a pure waste of time. I've also been thinking of also excluding things like php, but so far that hasn't been necessary. You can learn a lot of weird stuff when you try writing a web search program ... To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
Re: [abcusers] abc repository similiar to olga.net?
Toby asks: | Good question.. John, do you have an answer? I wrote about that before seeing this message. | On a similiar note (no pun intended), I'm actually quite impressed at how | efficient John's program is.. He's really quite a hand at Perl.. Perl | programs are notoriously CPU hungry.. John's program runs really tight.. | That machine also serves up about 10 moderate traffic websites, runs lpd | for a couple printers, and has the Thunderstone seach engine periodically | cranking away.. I never even notice John's program running away in the | background.. An interesting aspect to the perl story is that it's performance in many cases is competetive with even fairly good C code. There have been a number of reports of people who decide to rewrite an important perl program in C, and find that the C version is slower. The perl gang has learned some good tricks, and unless you know a lot about what you're doing, you'll have trouble matching what they've learned over the years. The main reason that a perl program can gobble cpu is that some things are very easy in perl that are difficult in most other languages. The language includes symbol-table lookups in a deceptively simple form, as a kind of array that takes character strings as a subscript. It's so easy to use that perl programmers learn to use it for everything. Anyone who has ever written a table lookup routine knows how much cpu time it takes. In most other languages, a symbol table is a big hairy deal that you use only as a last resort. In perl, you use them because it's easy. And if you don't understand the implications, you can end up with a very greedy little program. If you understand, it's just another very handy tool. I use tables a lot, but I'm always aware that that very simple indexing operation is expensive. But the perl interpreter has some of the most sophisticated table-handling routine known. Unless you're a real expert, you aren't going to improve on them. Perl can also gobble memory. One of the features of the language is the ability to slurp up (a technical term) an entire file into an array of strings. It only takes a few characters of punctuation: @data = FILE; This reads the entire contents of FILE into the data array. It's fast and easy, and there are a lot of things that will operate on the entire array. Then the command @data = (); frees the space. This is a powerful part of perl. But if you aren't aware of what it does, it can produce a monster program. My search bot doesn't do this. In fact, it uses fixed-length reads, to avoid the problems of web sites like Mac sites that don't have line feeds within their pages. | Of course having dual CPU's on there and alot of RAM helps :-) Yes, and my code is single-threaded, so it shouldn't ever use more than one cpu. It spends most of its time waiting for a TCP connection to go through. This typically takes longer than reading the data. A web search program that makes only one connection at a time really can't use much cpu time. Most of its time will be spent waiting on network events. OTOH, I've been contemplating stuffing some info into a database ... To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html