Laurie Griffiths writes:
>Here's a sort of test case: Take a 32 bar tune. Swap the
>A music with the B music and compare with the original.
>It should show great similarity. The spectral idea would
>since it essentially discards the order of the notes.
>
That's curious. I was thinking of something similar for the opposite
purpose: take the measures two-by-two, and interchange each pair. I think
you'd usually get a dogs breakfast which wasn't close to any known tune
(but which would have some surprisingly familiar spots!) But it would have
exactly the same note-frequencies as the original. In fact, you could do
much worse: take, say, the whole of Beethoven's Fifth, and play all the
G's first, then all the A's, ... Again, this would have the same
note-frequencies as the original symphony. Of course, it wouldn't be close
to any existing music...unless it turned out to be a John Cage
composition. :-)
>A Fourier approach might well, though if there were a
>discontinuity (i.e. big jump) at the phrase boundary either
>introduced or removed by the swap then this would
>create lots of havoc pretty much across the whole spectrum.
>
You've put your finger on the thing I don't like about Fourier
approximation---Fourier series converge slowly when faced with
discontinuities. At the moment, I'm favoring the tent functions, which
wouldn't have this problem.
Phil Taylor writes:
>I just hope that we haven't put off too many of the regular readers
>of this group by getting into the technicalities.
>
Oh, oh! That worries me too. So let me put up a:
WARNING:
This post is long. Too long. Personally, I'd suggest hitting the
delete button right now. I'm going to tread as lightly thru the
technicalities as I can---which means I'll indulge myself in literature
for a while---but I'll eventually have to get down in the trenches. A tiny
little bit, anyway.
The theme is "Why I think that all search algorithms work."
I've been repeating for some time that all reasonable methods of tune
searching should work reasonably well, and I was hoping someone would
challenge me on it so that I could explain why without appearing too
pedantic. But there seems to be a lull in the discussions of other topics,
so perhaps I should go into it now. (It goes without saying that what
follows is IMHO. Your mileage may vary.)
The basic, most important fact about tune-searching is that, compared
with the number of _possible_ tunes, there are very, very, very,...,very
few existing tunes. And I've understated it at that. It turns out that you
can throw away an incredible amount of information about a tune, and still
identify it. For example: my music teacher in junior high school used to
ask students to get up and clap out a tune they knew for the other
students to guess. No words, no music, just hand-clapping. This probably
wouldn't work with jigs and reels, but all we knew was pop music, where
the rhythm is more important, and it was rare that the tune wasn't
identified within the first couple of bars.
Here's a second example: I thought about composing a jig this
morning, picked up a tin whistle and played three notes to start it. They
were AFF, which almost forces the key of D. (This example is _not_ key
independent). Then I thought of the discussion we were having, and checked
to see which jigs started that way. There are some, fewer than one might
think. Then I added a fourth note. Almost any fourth note is
possible---and I'd wager that most of you on this list could improvise an
acceptable tune given any fourth note, so there are lots of possibilities.
(Note: just "acceptable"---I don't require "good". Those three starting
notes almost guarantee a banal tune. Maybe that's why there are so few of
them.)
Anyway, suppose the fourth note is D. Then the jig is "The Little
House Under the Hill." (!) There are probably others, but that's the only
one I found in my collection. (There are others which go to the d instead
of D, but they don't count.) Now suppose that instead of D, the fourth
note is B. Then it's the Pibroch of Donald Dhu. (Well, actually, I didn't
get any matches at all, since Black Donald the Piper is in G, not D, in
O'Neill's.) I didn't check further, but the point is that four notes were
sufficient to identify a tune among the best part of a thousand others.
This isn't isolated. It's almost typical. Make up your own
examples and check it out.
So, what's going on? It's a bit of a numbers game. How many tunes are
there? Breathnach estimated about 6000 Irish dance tunes in circulation.
That's a pretty small fraction of the tunes which have been written, since
most didn't survive to be written down. So what---100,000 maybe? Surely
more. Multiply by ten or a hundred, or better, a thousand, to get the
number of folk tunes which exist or have existed (and I'll classify both
classical and pop as folk music for this purpose.) So---maybe 10^8-10^9
tunes, say. How does that compare with what is possible? Let's just
consider reels. Up to eight notes per bar, A and B parts of eight bars
each, no repeats. Let's keep it within the octave, meaning seven choices
for each note. Sixteen times eight, or 128 notes, each with seven
possibilities...so 7^128 possibles, i.e. a bit over 10^108 possible tunes.
(And that's an under-estimate, since two eighth notes aren't the same as
one quarter note.) For comparison, I remember that in my youth, physicists
estimated that there were a total of about 10^75 sub-atomic particles in
the entire universe (electrons, protons, neutrons, mesons, etc. That's old
fashioned now. The debate these days is whether or not the it's finite.
But finite or not, it's pretty large compared to my salary.)
The argument I just gave is ridiculous, of course. Tunes differing by
a single note aren't really different. And music isn't just an arbitrary
sequence of notes. It's pretty structured. But...maybe not all that much.
Consider computer-generated music. (They talk about "brown noise" as
opposed to the purely random "white noise.") A typical way of generating
this is to analyze the frequency of intervals which occur in a given type
of music, see how and where they occur, and then let the computer choose
notes one-by-one by letting it randomly choose the next interval at each
stage, weighted according to the observed interval frequency in that
situation. The result is usually something you'd only want to listen to
when you really wanted to fall asleep, but, given an indulgent frame of
mind, it is recognizably music---boring, perhaps, but music nonetheless.
Now I don't have any of those algorithms at hand, but I suspect that given
one, it wouldn't be too hard to figure out the number of possible tunes.
Suppose we only consider the first 32 notes. I'd imagine that any
algorithm would allow at least two choices at each note. That would give
2^32 possible choices in all. Or, better, suppose it gives 4 choices. Then
there are 2^64 possible tunes. (I like this one because it ties in with
the famous story of the vizier who, offered his choice of rewards by the
sultan for inventing the game of chess, said, "I desire something very
modest. Please put one grain of wheat for me on the first square of the
chess board, two on the second square, four on the third, and so on." The
sultan was quite happy to oblige such a modest wish, but in fact the last
square would have 2^63 grains, making 2^64 grains in all. The sultan, when
he realized that the vizier had asked for more than the world's total
wheat harvest since the time of Adam, had him strangled, of course.
Rightly so. And let that be a lesson to us all.)
Now these large numbers are fun, but not really significant. We don't
expect to get an exact match of tunes, so the total number of possible
tunes is irrelevant. What about getting a _close_ match? Let's only
consider the first 32 notes.
Now at this point I'm forced to descend a bit further into math.
It's unavoidable. (Making search algorithms is part of music, biology,
physics, chemistry, and higher order hacking, but understanding why they
work is math. End of advertisement. :-)
Once we have translated the music into a sequence of notes, and
perhaps assigned a number to each, we have a 32-vector, or equivalently, a
point in 32-dimensional space. (You know, I just thought: the nice thing
about this is that 32-dimensional space would be totally uninteresting to
us if it weren't created by the music we play! :-)
Don't bother to visualize it; if you must, just draw on a piece of
paper with arrows as vectors. The only important fact about it here is
that it is _big_. So big that the all the 10^9 existing tunes (each one
represented by a point of this space) are _very_ thinly scattered.
(Here's a little exercise. Consider a cube in this space--it's a
32-dimensional cube, but just draw a square on your piece of paper an
you'll have the idea---along with a smaller cube inside it. All we have to
know about them is that all sides of a cube have the same length, and if
that length is S, the volume of the cube is S^32. Question: how large
does the smaller cube have to be to contain half the volume of the large
cube? It turns out that the side of the smaller cube has to be 98% of
the side of the larger cube. Second question: if the smaller cube has side
S/2--i.e. half the length of the side of the larger cube, what proportion
is its volume? The answer is 1/2^32, which is about 2x10^(-10), i.e. damn
small.
So...how does all this apply to tune-searching/analyzing algorithms?
The above illustrations were supposed to convince you that if we pick a
tune, it is pretty unlikely that we will find any other existing tune at
all close to it--no matter how we measure "close". (According to the last
example above, if we take a little cube of side S/2 about each of the 10^9
existing tunes---remember each tune is represented by a point in this
space---the _total_ volume of these cubes fill up less than 1/5 of the
available space. So if you let your computer generate a tune at random,
the odds are 4 to 1 that it_ won't_ be within a distance S/2 of _any_
existing tune.) And if we want to make that cube large enough to include a
fairly large number of tunes, we have to make it almost as large as the
whole original space. (I only talked about Euclidean distance, but the
arguments actually apply more generally.)
But...this is good! It's exactly what we want for a tune searching
algorithm. And it's forced on us! Because the farther apart tunes are,
the easier it is to distinguish between them.
So, finally, I'm getting to the point. All the algorithms I've
seen have a couple of things in common: first, they simplify the tune to
get something they can handle. (The up-down algorithm only considers
whether successive intervals increase or decrease, so it tosses out
everything about the absolute pitch; the frequency algorithm keeps the
pitches, but throws out their order; the gene-comparing algorithm tosses
out whatever isn't common to the two tunes, and measures the common
segments, Breathnach's algorithm only counts notes which happen on the
major beats---he didn't have to deal with syncopation---while the
algorithm I suggested tosses out the exact pitches in favor of a
smoothed-out version which goes up and down more-or-less where the music
does.) But this simplification means that, as far as the algorithm is
concerned, there are a _lot_ of tunes which are identical. For example,
the frequency algorithm regards any two tunes as the same if they have the
same number of notes of each pitch. (A physicist would say that it was
"invariant under permutations of the notes.")
The second thing is that they measure--or can be made to
measure--some kind of distance between tunes. (Someone here suggested that
for the frequency method, one could use the Euclidean distance between
"frequency vectors" to measure tune distance.)
So these algorithms have a common property: there will be many
different tunes which will appear identical to the algorithm. So why do
they work? Simply because the actual tunes are so sparsely scattered.
While there are many _possible_ tunes which can be confused with a given
tune, it's quite unlikely that any of them will be a real one.
So, what is the conclusion? I'll try three of them on you. The first
is that an algorithm should work reasonably well unless it happens to
identify a lot of _existing_ tunes. (This is why I said that the note
frequency approach was "unpromising." Well...perhaps there was a desire to
give Phil a little puck in the ribs...;-) For it tosses away so much info
about the tune---everything about note order---that it would appear
unlikely to work at first glance. But apparently, relatively few of the
tunes you'd get by just permuting notes are viable, so they don't affect
the algorithm. So you could equally well say it was rather perceptive to
realize that it's obvious flaws weren't important.)
And in fact, it might even be hard to design this kind of error
in, even if one wanted to.
The second conclusion is that, to underline Sigfrid Lundberg's point,
that for any serious use, one would want several algorithms, with
different strengths and weaknesses. Because any given algorithm might
identify existing tunes which differ in some way the algorithm doesn't
measure. Or fail to recognize the similarity of tunes which are similar in
ways it doesn't recognize. Or perhaps one should try adaptive
methods---e.g. a rough cut followed by a closer look.
The third conclusion is that, at least at this stage, tune
finding/analyzing algorithms need experiment, not theory. I claimed that
reasonable algorithms should work reasonably well, but this is hardly
enough to give us much confidence in using them. It would be interesting
to experiment with existing algorithms. For example, one could turn one
loose on a large file of (correctly written!) tunes to find the typical
distance between tunes. Once you know that, have it spit out pairs of
tunes which are much closer than the typical distance, and see if they
sound similar; check pairs of tunes known to be closely related to see if
the algorithm agrees, and then tweak parameters as needed...
But enough! If you've read this far, accept my apologies for
taking so much of your time!
Cheers,
John Walsh
To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html