Document?

Josh Patterson Wed, 16 Nov 2011 09:40:12 -0800

I'd have to admit my interest in SVMs is more of the "abstract
curiosity" nature;


In the case of needed focus in the near term, similar to how Grant tagged:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+MAHOUT_INTRO_CONTRIBUTE

Could you then make a list of JIRAs that you think are more
interesting in the near term, possibly more relevant?

JP

On Wed, Nov 16, 2011 at 10:46 AM, Ted Dunning <[email protected]> wrote:
> On Wed, Nov 16, 2011 at 12:09 AM, urun dogan <[email protected]> wrote:
>
>> Hi All;
>>
>> As I mentioned, I really found interesting to implement SGD and Pegasos. We
>> can add Pegasos into SGD modules.
>
>
> Based on Leon Bottou's results, I would recommend a simple SGD
> implementation of SVM rather than Pegasos.
>
> http://leon.bottou.org/projects/sgd
> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
> http://arxiv.org/abs/1107.2490
>
>
>> However, I think there are two issues we
>> need to clarify:
>>
>> 1) In general SGD like ideas are used for online learning (of course they
>> can be converted to batch learning) and Pegasos is used for batch learning.
>>
>
> I see no need for batch learning unless there is a net training benefit.
>
>
>>  Therefore may be we need to two similar but different enough software
>> architecture (I am not sure). If my intuition is right then it makes sense
>> to implement Pegasos and SGD independently. Further, especially Pegasus is
>> a state of the art method (in terms of speed) for text classification,
>> structured data prediction and these kind of problems, may be this is also
>> a point we need to take into account because there thousands of people who
>> are dealing with web scale text data for search engines, recommender
>> systems (I am not one of them therefore may be I am wrong here).
>>
>
> Pegasos is nice, but I don't necessarily see it as state of the art.
>
> For large-scale problems, in fact, I don't even see SVM as state of the
> art.  Most (not all) large-scale problems tend to be sparse and very high
> dimension.  This makes simple linear classifiers with L1 regularization
> very effective and often more effective than L2 regularization as with SVM.
>
>
>
>> 2)  Pegasos will be faster for than any other SVM solver for only linear
>> kernels.
>
>
> I don't see this in the literature.  See Xu's paper, referenced above.
>
>
>> In the past there was belief that Pegasos can be applied to
>> nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it
>> will be still faster than other SVM solvers/SMO like solvers.
>
>
> I am not hearing a huge need for non-linear kernels in large scale
> learning.  Perhaps with image processing, but not with much else.  Also, I
> haven't heard that there isn't an SGD-like learning method for non-linear
> kernels.
>
>
>
>> ... It is also known fact that, with a appropriate model selection,
>> nonlinear kernels give better classification accuracy then linear kernels.
>>
>
> Actually, not.  I think that the situations where non-linear kernels are
> better are more limited than most suppose, particularly for large-scale
> applications.
>
>
>> Exactly at this point, we need online learning (SGS/AGSD based method), we
>> can still use nonlinear kernels, parallelize the algorithm and we can have
>> a online SVM method for large/web scale datasets.
>>
>
> Now this begins to sound right.
>
> Honestly I am so much into SVM and kernel machines and I fear that I am
>> making big fuss out of small problems.
>
>
> My key question is whether you have problems that need solving.  Or do you
> have an itch to do an implementation for the sake of having the
> implementation?
>
> Either one is a reasonable motive, but the first is preferable.
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: What to Implement/Improve/Document?

Reply via email to