Hi,

I have written the previous email before reading Josh's email. Are there
any objections if I conclude that: implementation of SGD/ASGD based methods
have priority and therefore I will start implement these methods soon ?

@Ted: If this is the case, I am looking forward to have your supervision
about this issue.

Cheers
Ürün

On Wed, Nov 16, 2011 at 6:42 PM, urun dogan <[email protected]> wrote:

> Hi;
>
> First of all I do not want to be in the position of defending Pegasos
> algorithm. I am interested in implementing SGD/ASGD or Pegosos or even
> both. Of course what I want is to have useful code for the project (not for
> only me). I do not have a preference over algorithms if they are fast
> enough to solve problems.
>
> Based on Leon Bottou's results, I would recommend a simple SGD
>> implementation of SVM rather than Pegasos.
>>
>> http://leon.bottou.org/projects/sgd
>> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
>> http://arxiv.org/abs/1107.2490
>>
>>
>> > However, I think there are two issues we
>> > need to clarify:
>> >
>> > 1) In general SGD like ideas are used for online learning (of course
>> they
>> > can be converted to batch learning) and Pegasos is used for batch
>> learning.
>> >
>>
>> I see no need for batch learning unless there is a net training benefit.
>>
>
> Point taken. I do not have enough large scale experience.
>
>
>>
>>
>> >  Therefore may be we need to two similar but different enough software
>> > architecture (I am not sure). If my intuition is right then it makes
>> sense
>> > to implement Pegasos and SGD independently. Further, especially Pegasus
>> is
>> > a state of the art method (in terms of speed) for text classification,
>> > structured data prediction and these kind of problems, may be this is
>> also
>> > a point we need to take into account because there thousands of people
>> who
>> > are dealing with web scale text data for search engines, recommender
>> > systems (I am not one of them therefore may be I am wrong here).
>> >
>>
>> Pegasos is nice, but I don't necessarily see it as state of the art.
>>
>> For large-scale problems, in fact, I don't even see SVM as state of the
>> art.  Most (not all) large-scale problems tend to be sparse and very high
>> dimension.  This makes simple linear classifiers with L1 regularization
>> very effective and often more effective than L2 regularization as with
>> SVM.
>>
>
> Point taken. I completely agree that L1 regularization is very effective
> for most (not all)
> of the large scale data sets (At least I see this from the published
> papers).
>
>>
>>
>>
>> > 2)  Pegasos will be faster for than any other SVM solver for only linear
>> > kernels.
>>
>>
>> I don't see this in the literature.  See Xu's paper, referenced above.
>>
>
> Your comments are very interesting and I will definitely read Xu's paper
> (thanks a lot) . However, when I look Table 2 of  Shai Shalev-Shwartz
> paper (reference given in my previous email), it seems that Pegosas is not
> faster than Svm-Light when used with a nonlinear kernel. I believe this is
> caused by kernel evaluations which are used in kernelized Pegasos
> (algorithm is given Figure 3 of the same paper).
>
>
>>
>> > In the past there was belief that Pegasos can be applied to
>> > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and
>> it
>> > will be still faster than other SVM solvers/SMO like solvers.
>>
>>
>> I am not hearing a huge need for non-linear kernels in large scale
>> learning.  Perhaps with image processing, but not with much else.  Also, I
>> haven't heard that there isn't an SGD-like learning method for non-linear
>> kernels.
>>
>> Point taken.
>
>>
>> > ... It is also known fact that, with a appropriate model selection,
>> > nonlinear kernels give better classification accuracy then linear
>> kernels.
>> >
>>
>> Actually, not.  I think that the situations where non-linear kernels are
>> better are more limited than most suppose, particularly for large-scale
>> applications.
>>
>>
>> > Exactly at this point, we need online learning (SGS/AGSD based method),
>> we
>> > can still use nonlinear kernels, parallelize the algorithm and we can
>> have
>> > a online SVM method for large/web scale datasets.
>> >
>>
>> Now this begins to sound right.
>>
>> Honestly I am so much into SVM and kernel machines and I fear that I am
>> > making big fuss out of small problems.
>>
>>
>> My key question is whether you have problems that need solving.  Or do you
>> have an itch to do an implementation for the sake of having the
>> implementation?
>>
>> Either one is a reasonable motive, but the first is preferable.
>>
>
> As I mentioned above, I am really interested in applying machine learning
> algorithms to large/web scale data. This does not mean
> that I am a fan of a particular algorithm. Therefore I started to this
> discussion and titled it "What to Implement/Improve/Document?".   From your
> comments also from the comments of other people I started to learn a lot
> (thanks a lot). I do not have an enough practical experience for large/web
> scale data problems so I am open to comments from everyone.  As you have
> stated that "I see no need for batch learning unless there is a net
> training benefit" then SGD/AGSD based algorithms should have a priority
> over "batch learning" methods. I am looking forward to have more comments
> and I hope we will have a decision on what to implement.
>
> Cheers
> Ürün
>
>

Reply via email to