Jack Campin writes: | So operations on large-scale ABC databases are likely to become more | important - things like: | | - storing the tunes in a database, parsing and indexing them at entry | time | | - distributed versioning, so that an ABC creator could get forwarding | pointers or editing commands inserted into superseded copies of tunes | | - plagiarism search (assume the entire Harry Fox Agency database) | | - mending corrupt tunes by finding better versions of garbled parts | | - collating information from all copies of a tune, so you could unify | the discography from one copy with the detailed formatting code from | another | | - indexing the corpus of tunes by harmonic progression, as calculated | by something like ABCMus's auto-harmonizer, and allowing fuzzy | search | | All of that would be easier if persistent parse trees were available. | | It's a pity the Tune Finder doesn't yet have options to download | everything it knows about or synchronize your own mirror with it. | In the long run this might be *less* resource-intensive than what | it's presently doing, as complete downloads could be offloaded onto | mirror sites and intelligent synchronization of updates doesn't need | to be any more expensive than search.
Actually, doing that would be not just feasible; it would be easy, except for that one little elephant hiding over there in the corner: Copyright. My search bot obviously does download every file that it scans. It normally throws them away. But it has a flag saying to cache any file that contains a tune. I use that occasionally when I'm testing new ideas, so that I don't have to repeatedly hit some poor unsuspecting server for a file. It really speeds up testing to have a few good test files on the local disk. It's set up so that all my search program has to do to use the cached version of a file is to replace a URL's "://" with "/", and the result is the cached file. However, I've never told people where to find the cache. Most of the time it's empty, and you won't find anything there. This is because I don't have permission to "mirror" everyone's files. I have the space available, but I'm not at all sure I'd even want to try negotiating permissions with the 280 sites that the search bot knows about. A scan just finished early this morning, and the cache is full at the moment. So if the above is sufficient clue, interested parties could find it. It'll probably even stay around for a few days, since I've been doing some experimenting with some ideas. But it could vanish at any time. It is somewhat a pity that the current copyright laws do so effective a job of blocking useful and innovative ideas like those in Jack's list. Maybe everyone should be getting together lists of such ideas and hitting their politicians for changes in the laws to encourage development. A country that legalizes such innovation could likely become a center of development. If I could be assured that I wouldn't be prosecuted or banned from using the Internet for doing so, I could easily make my cache a permanent part of my collection. I'd just not delete it. Then I could try writing a web page that lets you combine them, or extract tunes from some of them into a new file, or whatever. But the way things are going these days, attempting something like this could easily produce some rather huge fines. (I've recently been wondering if it might be time to start learning Mandarin. And there's some wonderful music from that part of the world. ;-) (And there's the ongoing problem of the variety of what passes for ABC on the Net. I really wish we could get people to stop burying ABC inside HTML. That's a real nightmare for a programmer. It's much worse than the minor differences in ABC dialects. I'd probably just have to ban such tunes from any software that tries to combine things from different sources. Or maybe I'll find the time to write a good DeHTMLizer ...) To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html