On Tue, Dec 29, 2009 at 10:45 AM, Mani Kumar <[email protected]>wrote:
> @Robin: thanks! btw whats the reasoning behind using CBayes for >2 > categories? While bayes works for spam/not spam kinda classification, why > not for > 2 categories. It'd great if you can give some pointers to read > and > understand. > Just a slight diff in math behind it. CBayes is Bayes but tries to classify objects as not belonging to a class instead of belonging to a class. For more read insight you can read up the paper on Complementary Naive Bayes. Do a quick experiment on 20 news groups with CBayes and Bayes. You will see the difference. > @Ted: Currently i just started experimentation with mahout, and don't have > a > very clear picture of how it can work for us. I'll let you details as i get > more experience with mahout and more deeper understanding of our > requirement. > > Thanks! > Mani Kumar > > On Tue, Dec 29, 2009 at 6:14 AM, Ted Dunning <[email protected]> > wrote: > > > mani, > > > > You are sounding more and more like the poster child for an on-line > > classifier. > > > > The idea would be that you would give your classified docs to the system > > first for testing, then again for incremental training. You can use the > > results of the test to adjust the learning rate for the incremental > > learning. > > > > See the work I have started with MAHOUT-228 for the beginnings of this. > > Let > > me know where it should go to help with your needs (i.e. what entry > points > > that you would need). > > > > On Mon, Dec 28, 2009 at 1:33 PM, Mani Kumar <[email protected] > > >wrote: > > > > > lets talk about bigger numbers e.g. i have more than 1 million docs and > i > > > get 10k new docs every day out of which 6k is already classified. > > > > > > Monitoring performance is good but it can be done weekly instead of > daily > > > just to reduce cost. > > > > > > I actually wanted to avoid the retraining as much as possible because > it > > > comes with huge cost for large dataset. > > > > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > > >
