I'd have to admit my interest in SVMs is more of the "abstract curiosity" nature;
In the case of needed focus in the near term, similar to how Grant tagged: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+MAHOUT_INTRO_CONTRIBUTE Could you then make a list of JIRAs that you think are more interesting in the near term, possibly more relevant? JP On Wed, Nov 16, 2011 at 10:46 AM, Ted Dunning <[email protected]> wrote: > On Wed, Nov 16, 2011 at 12:09 AM, urun dogan <[email protected]> wrote: > >> Hi All; >> >> As I mentioned, I really found interesting to implement SGD and Pegasos. We >> can add Pegasos into SGD modules. > > > Based on Leon Bottou's results, I would recommend a simple SGD > implementation of SVM rather than Pegasos. > > http://leon.bottou.org/projects/sgd > http://leon.bottou.org/publications/pdf/compstat-2010.pdf > http://arxiv.org/abs/1107.2490 > > >> However, I think there are two issues we >> need to clarify: >> >> 1) In general SGD like ideas are used for online learning (of course they >> can be converted to batch learning) and Pegasos is used for batch learning. >> > > I see no need for batch learning unless there is a net training benefit. > > >> Therefore may be we need to two similar but different enough software >> architecture (I am not sure). If my intuition is right then it makes sense >> to implement Pegasos and SGD independently. Further, especially Pegasus is >> a state of the art method (in terms of speed) for text classification, >> structured data prediction and these kind of problems, may be this is also >> a point we need to take into account because there thousands of people who >> are dealing with web scale text data for search engines, recommender >> systems (I am not one of them therefore may be I am wrong here). >> > > Pegasos is nice, but I don't necessarily see it as state of the art. > > For large-scale problems, in fact, I don't even see SVM as state of the > art. Most (not all) large-scale problems tend to be sparse and very high > dimension. This makes simple linear classifiers with L1 regularization > very effective and often more effective than L2 regularization as with SVM. > > > >> 2) Pegasos will be faster for than any other SVM solver for only linear >> kernels. > > > I don't see this in the literature. See Xu's paper, referenced above. > > >> In the past there was belief that Pegasos can be applied to >> nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it >> will be still faster than other SVM solvers/SMO like solvers. > > > I am not hearing a huge need for non-linear kernels in large scale > learning. Perhaps with image processing, but not with much else. Also, I > haven't heard that there isn't an SGD-like learning method for non-linear > kernels. > > > >> ... It is also known fact that, with a appropriate model selection, >> nonlinear kernels give better classification accuracy then linear kernels. >> > > Actually, not. I think that the situations where non-linear kernels are > better are more limited than most suppose, particularly for large-scale > applications. > > >> Exactly at this point, we need online learning (SGS/AGSD based method), we >> can still use nonlinear kernels, parallelize the algorithm and we can have >> a online SVM method for large/web scale datasets. >> > > Now this begins to sound right. > > Honestly I am so much into SVM and kernel machines and I fear that I am >> making big fuss out of small problems. > > > My key question is whether you have problems that need solving. Or do you > have an itch to do an implementation for the sake of having the > implementation? > > Either one is a reasonable motive, but the first is preferable. > -- Twitter: @jpatanooga Solution Architect @ Cloudera hadoop: http://www.cloudera.com
