| Is there any feasable way to avoid duplicates? I often seem to get the same | version of a tune within search results. I guessing but do people lift abc | from one site and post it to another?
Is it feasible to create a "hash code" or "signature" for each tune? Length is a start, but one would like to ignore some obvious ways that things can get damaged. So as a start - how about some of these: 1. Note if there are or are not any chords (one bit) 2. Note if there are any things other than notes present e.g. !pralltriller!, ~, etc. (one more bit). 3. Throw away all chords, all headers and white space, (space, \t, \n, \r) trailing ! or \ and all other annotations like the stuff in category 2 above. Hash what's left - something like a code in the range 1..99 would do fine. Reducing duplicates to 1% would be acceptable. 4. Likewise hash the chords - the reserved hash value 0 meaning "there aren't any chords". 5. As above but transposed to standard key... So the final result is an N-part code made from some of the above (whatever is easy to code up) such as C09-T76-H16 meaning "It does have chords and the chords hash to 09, the tune hashes to 76 and the other stuff, headers etc. hashes to code 16". Another tune with (say) C00-T76-H16 is very likely to be a copy of exactly the same tune but with no chords. What do you think? As a user of the tune finder (which I am!!) I think it would make a lot of difference. Laurie To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
