On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:
the reason i used countries was i couldn't think of some other
larger group
of labels.
Also wikipedia has over 100K categories, A document has multiple
categories
too. So finding a non overlapped sets of documents wasn't
easy(Which makes
it easy to differentiate them).First thing I could think of was
countries
Are you saying that you think docs only have one country assigned to
them?
In the little bit of grepping I've done, I think I might try a hand at
something like "school subjects", i.e Math, History, Science. Of
course, the multiple categories thing is a bit weird since we are
trying to classify to a single category. For now, the example is
first one found is the chosen one.
-Grant