Getting Started with Classification

Grant Ingersoll Tue, 21 Jul 2009 19:41:55 -0700

I have been doing some work on classification (of Wikipedia) and amhaving a hard time actually running the Test classifier. I trained ona couple of categories (history and science) on quite a few docs, butnow the model is so big, I can't load it, even with almost 3 GB ofmemory. I'm just wondering what people would recommend here. Onethought is that our code is really String/Text based. I also noticewe start with default values for the maps used to load the models,which probably means we are resizing a lot. Should we use Strings orwould it be better to have some custom Writables and then keep trackof the actual terms separately kind of like the doc clustering does aswell as tracking the size so we can avoid resizing?

Also, what is generally the size of training sets that people use forsomething like Naive Bayes (or complementary)? Or, do I suck it upand just use more memory?


Thoughts?

-Grant

Getting Started with Classification

Reply via email to