On Wed, Nov 16, 2011 at 12:09 AM, urun dogan <[email protected]> wrote:

> Hi All;
>
> As I mentioned, I really found interesting to implement SGD and Pegasos. We
> can add Pegasos into SGD modules.


Based on Leon Bottou's results, I would recommend a simple SGD
implementation of SVM rather than Pegasos.

http://leon.bottou.org/projects/sgd
http://leon.bottou.org/publications/pdf/compstat-2010.pdf
http://arxiv.org/abs/1107.2490


> However, I think there are two issues we
> need to clarify:
>
> 1) In general SGD like ideas are used for online learning (of course they
> can be converted to batch learning) and Pegasos is used for batch learning.
>

I see no need for batch learning unless there is a net training benefit.


>  Therefore may be we need to two similar but different enough software
> architecture (I am not sure). If my intuition is right then it makes sense
> to implement Pegasos and SGD independently. Further, especially Pegasus is
> a state of the art method (in terms of speed) for text classification,
> structured data prediction and these kind of problems, may be this is also
> a point we need to take into account because there thousands of people who
> are dealing with web scale text data for search engines, recommender
> systems (I am not one of them therefore may be I am wrong here).
>

Pegasos is nice, but I don't necessarily see it as state of the art.

For large-scale problems, in fact, I don't even see SVM as state of the
art.  Most (not all) large-scale problems tend to be sparse and very high
dimension.  This makes simple linear classifiers with L1 regularization
very effective and often more effective than L2 regularization as with SVM.



> 2)  Pegasos will be faster for than any other SVM solver for only linear
> kernels.


I don't see this in the literature.  See Xu's paper, referenced above.


> In the past there was belief that Pegasos can be applied to
> nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it
> will be still faster than other SVM solvers/SMO like solvers.


I am not hearing a huge need for non-linear kernels in large scale
learning.  Perhaps with image processing, but not with much else.  Also, I
haven't heard that there isn't an SGD-like learning method for non-linear
kernels.



> ... It is also known fact that, with a appropriate model selection,
> nonlinear kernels give better classification accuracy then linear kernels.
>

Actually, not.  I think that the situations where non-linear kernels are
better are more limited than most suppose, particularly for large-scale
applications.


> Exactly at this point, we need online learning (SGS/AGSD based method), we
> can still use nonlinear kernels, parallelize the algorithm and we can have
> a online SVM method for large/web scale datasets.
>

Now this begins to sound right.

Honestly I am so much into SVM and kernel machines and I fear that I am
> making big fuss out of small problems.


My key question is whether you have problems that need solving.  Or do you
have an itch to do an implementation for the sake of having the
implementation?

Either one is a reasonable motive, but the first is preferable.

Reply via email to