Apparently. It was overfitting. I used the Test-Train split given by Phillipe in mahout-user list.
When the algorithm was storing the weights of all the words in the Complementary Class - The Accuracy over the Test set was 90.2% and the over that of the Train set itself was 99.32%. But the Size of the Model ~= Number of features x Number of labels When the algorithm was storing the weights of just the words in the Non-Complementary Class - The Accuracy over the Test set was 84.47% and that over the Train set was 99.90%. The Model becomes a sparse Matrix. So i guess I will have to go back to the earlier method. On Sat, Jul 12, 2008 at 11:54 AM, Robin Anil <[EMAIL PROTECTED]> wrote: > It too soon for celebrations. This quick hack might have increased over > fitting. Keep fingers crossed > > Robin > > > On Sat, Jul 12, 2008 at 11:51 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > >> Well done! >> >> On Fri, Jul 11, 2008 at 11:18 PM, Robin Anil <[EMAIL PROTECTED]> >> wrote: >> >> > >> > >> > The self classification accuracy on the 20Newsgroups jumped from 98.2 to >> > 99.87. And it solved the dense matrix problem also >> > >> > > >
