As I understand it, the tune finder does the following for every tune:
1. Extracts it from the source file
2. puts it into a canonical form:
a. strips leading and trailing blank lines
b. changes the line endings to a standard form
c. (dunno about this one) strips trailing white-space from lines
3. Stores a copy of the canonical form in a database,
4. Stores index info (title, key, meter, author, etc) in the database as well
5. When asked, retrieves the tune and converts it to other formats.
Would it be possible, or make sense, to also store a hash (like a CRC-32 or
an MD5 signature) of the canonical tune in the database, and only show
unique hashes on a query?
I think that would remove a large number of spurious duplicate results from
the query results. It wouldn't remove the work of different
transcriptionists, or variants in a tune, but it would remove duplications
caused by multiple copies of the same transcription floating around the web.
To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html