Re: Categorization stuff

Grant Ingersoll Fri, 17 Jul 2009 07:18:22 -0700

Also, do we have any tools for setting up training/test sets forWikipedia examples? Seems like a generally useful thing to have.Take annotated data and automatically split, no?


-Grant


On Jul 17, 2009, at 8:32 AM, Grant Ingersoll wrote:

On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:
the reason i used countries was i couldn't think of some otherlarger group
of labels.
Also wikipedia has over 100K categories, A document has multiplecategoriestoo. So finding a non overlapped sets of documents wasn'teasy(Which makesit easy to differentiate them).First thing I could think of wascountries
Are you saying that you think docs only have one country assigned tothem?
In the little bit of grepping I've done, I think I might try a handat something like "school subjects", i.e Math, History, Science. Ofcourse, the multiple categories thing is a bit weird since we aretrying to classify to a single category. For now, the example isfirst one found is the chosen one.
-Grant

Re: Categorization stuff

Reply via email to