Thomas Ball <[EMAIL PROTECTED]> writes:

> I have information in the form of a list of roughly 500 keyword
> search terms or phrases and associated with each keyword are 3
> counts representing the number of times that word or phrase was used
> on either Google, Yahoo or MSN during the past month.  Ideally I
> would like to come up with a matrix of some type, e.g., a proximity
> or distance-based matrix, for input to a nearest neighbor analysis
> based on this information.  It's just not apparent to me how I can
> do this with the information in this format.
>
> Anybody have any suggestions how I can create a proximity or
> similarity matrix out of this data?

Umm...well...

1) For each record in your list, extract the search term as a string,
   and the counts as a three-dimensional tuple/vector/array, using a
   regular expression based on the formatting of your list.

   a) Assign the search term string as a key, and the vector as the
      corresponding value in a hash/dictionary/associative array.

   b) Spool the search term out to a labels file.

2) Run a nested loop, iterating at each level over the keys of your
   hash. The exact logic at the inner level will depend on whether you
   need an upper triangular, lower triangular, or complete matrix as
   input for your nearest neighbor analysis.

   a) Retrieve the vectors corresponding to your two keys, and call a
      proximity function that expects two three-dimensional vectors as
      input (L2, cosine, or whatever floats your boat).

   b) Either spool the resulting value to an output stream (with
      delimiters and formatting appropriate to your nearest neighbor
      analysis) or keep it in another key/value structure (like a DBM
      file), keyed on a concatenation of the two hash keys with a
      delimiter between them (Xkey;Ykey).

Was this the level of description you wanted?

Dave

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Reply via email to