Also, do we have any tools for setting up training/test sets for
Wikipedia examples? Seems like a generally useful thing to have.
Take annotated data and automatically split, no?
-Grant
On Jul 17, 2009, at 8:32 AM, Grant Ingersoll wrote:
On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:
the reason i used countries was i couldn't think of some other
larger group
of labels.
Also wikipedia has over 100K categories, A document has multiple
categories
too. So finding a non overlapped sets of documents wasn't
easy(Which makes
it easy to differentiate them).First thing I could think of was
countries
Are you saying that you think docs only have one country assigned to
them?
In the little bit of grepping I've done, I think I might try a hand
at something like "school subjects", i.e Math, History, Science. Of
course, the multiple categories thing is a bit weird since we are
trying to classify to a single category. For now, the example is
first one found is the chosen one.
-Grant