Also, do we have any tools for setting up training/test sets for Wikipedia examples? Seems like a generally useful thing to have. Take annotated data and automatically split, no?

-Grant

On Jul 17, 2009, at 8:32 AM, Grant Ingersoll wrote:


On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:


the reason i used countries was i couldn't think of some other larger group
of labels.
Also wikipedia has over 100K categories, A document has multiple categories too. So finding a non overlapped sets of documents wasn't easy(Which makes it easy to differentiate them).First thing I could think of was countries

Are you saying that you think docs only have one country assigned to them?

In the little bit of grepping I've done, I think I might try a hand at something like "school subjects", i.e Math, History, Science. Of course, the multiple categories thing is a bit weird since we are trying to classify to a single category. For now, the example is first one found is the chosen one.

-Grant


Reply via email to