@Robin: thanks! btw whats the reasoning behind using CBayes for >2 categories? While bayes works for spam/not spam kinda classification, why not for > 2 categories. It'd great if you can give some pointers to read and understand.
@Ted: Currently i just started experimentation with mahout, and don't have a very clear picture of how it can work for us. I'll let you details as i get more experience with mahout and more deeper understanding of our requirement. Thanks! Mani Kumar On Tue, Dec 29, 2009 at 6:14 AM, Ted Dunning <[email protected]> wrote: > mani, > > You are sounding more and more like the poster child for an on-line > classifier. > > The idea would be that you would give your classified docs to the system > first for testing, then again for incremental training. You can use the > results of the test to adjust the learning rate for the incremental > learning. > > See the work I have started with MAHOUT-228 for the beginnings of this. > Let > me know where it should go to help with your needs (i.e. what entry points > that you would need). > > On Mon, Dec 28, 2009 at 1:33 PM, Mani Kumar <[email protected] > >wrote: > > > lets talk about bigger numbers e.g. i have more than 1 million docs and i > > get 10k new docs every day out of which 6k is already classified. > > > > Monitoring performance is good but it can be done weekly instead of daily > > just to reduce cost. > > > > I actually wanted to avoid the retraining as much as possible because it > > comes with huge cost for large dataset. > > > > > > -- > Ted Dunning, CTO > DeepDyve >
