The matrix algebra is just a compact notation for a pattern of arithmetic operations.
Let's actually take documents as rows and words as columns since that is more common practice. If you look at the definition of the matrix product A' A, (where A' is the transpose of A, i.e. it is term by document rather than document by term), then we get this: b_ij = sum_k a_ki a_kj If all of the elements of A are binary, then the product inside the sum will be 1 where both a_ik and a_jk are 1. That means the sum will be a count of the documents which have both word i and word j. This is the cooccurrence count for words i and j. IF A is not binary, but is weighted then this sum is the weighted similarity between words i and j. Repeating this trick, you find that B'B = (A'A)' (A'A) is a term by term matrix that measures how similar the coocurrence vectors are for two words. If you expand it out, you will see that B'B is nothing but a bunch of dot-products (i.e. cosines of angles multiplied by magnitudes). You may want to normalize the rows of A'A if you are using weighted arithmetic or sparsify A'A if you are using counts, but the pattern of operations. Again, there is nothing magical about the matrix notation. It is just a compact way of describing a bunch of arithmetic. It also let's us tap into a wealth of results that have been derived for linear algebra that we can either misuse for our purposes or from which we can derive inspiration. On Wed, Jun 24, 2009 at 6:02 AM, Paul Jones <[email protected]>wrote: > >>>> Okay aside from being confused with matrix algebra :-), am confused > with the "easy to implement using a doc x term matrix", i.e not sure how a > doc-term matrix would work out the similiarity between words, is it not > working out the occurrence of words in a doc. Maybe I am > misunderstanding...Lets say I have a matrix built, where the docs are the > columns, and the words as rows, now my limited understanding from what I > have read says that this matrix can be represented as a number of vectors, > eg lets say we have one document, with 3 words, then the x/y/z axis will > represent each word and its freq of occurence, and hence the point in space > forming the vector depicts this word related to that document. > > And this can be expanded. Now if we have 2 documents, with 2 more words, we > have another point. The distance between them shows how similar they are, > and hence how similar the document is too each other. > > So far so good, but I am unsure how this translates in showing how similar > the words are themselves, i.e co-occurence, would that not have to be a > term-term matrix > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
