Hi, I have written the previous email before reading Josh's email. Are there any objections if I conclude that: implementation of SGD/ASGD based methods have priority and therefore I will start implement these methods soon ?
@Ted: If this is the case, I am looking forward to have your supervision about this issue. Cheers Ürün On Wed, Nov 16, 2011 at 6:42 PM, urun dogan <[email protected]> wrote: > Hi; > > First of all I do not want to be in the position of defending Pegasos > algorithm. I am interested in implementing SGD/ASGD or Pegosos or even > both. Of course what I want is to have useful code for the project (not for > only me). I do not have a preference over algorithms if they are fast > enough to solve problems. > > Based on Leon Bottou's results, I would recommend a simple SGD >> implementation of SVM rather than Pegasos. >> >> http://leon.bottou.org/projects/sgd >> http://leon.bottou.org/publications/pdf/compstat-2010.pdf >> http://arxiv.org/abs/1107.2490 >> >> >> > However, I think there are two issues we >> > need to clarify: >> > >> > 1) In general SGD like ideas are used for online learning (of course >> they >> > can be converted to batch learning) and Pegasos is used for batch >> learning. >> > >> >> I see no need for batch learning unless there is a net training benefit. >> > > Point taken. I do not have enough large scale experience. > > >> >> >> > Therefore may be we need to two similar but different enough software >> > architecture (I am not sure). If my intuition is right then it makes >> sense >> > to implement Pegasos and SGD independently. Further, especially Pegasus >> is >> > a state of the art method (in terms of speed) for text classification, >> > structured data prediction and these kind of problems, may be this is >> also >> > a point we need to take into account because there thousands of people >> who >> > are dealing with web scale text data for search engines, recommender >> > systems (I am not one of them therefore may be I am wrong here). >> > >> >> Pegasos is nice, but I don't necessarily see it as state of the art. >> >> For large-scale problems, in fact, I don't even see SVM as state of the >> art. Most (not all) large-scale problems tend to be sparse and very high >> dimension. This makes simple linear classifiers with L1 regularization >> very effective and often more effective than L2 regularization as with >> SVM. >> > > Point taken. I completely agree that L1 regularization is very effective > for most (not all) > of the large scale data sets (At least I see this from the published > papers). > >> >> >> >> > 2) Pegasos will be faster for than any other SVM solver for only linear >> > kernels. >> >> >> I don't see this in the literature. See Xu's paper, referenced above. >> > > Your comments are very interesting and I will definitely read Xu's paper > (thanks a lot) . However, when I look Table 2 of Shai Shalev-Shwartz > paper (reference given in my previous email), it seems that Pegosas is not > faster than Svm-Light when used with a nonlinear kernel. I believe this is > caused by kernel evaluations which are used in kernelized Pegasos > (algorithm is given Figure 3 of the same paper). > > >> >> > In the past there was belief that Pegasos can be applied to >> > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and >> it >> > will be still faster than other SVM solvers/SMO like solvers. >> >> >> I am not hearing a huge need for non-linear kernels in large scale >> learning. Perhaps with image processing, but not with much else. Also, I >> haven't heard that there isn't an SGD-like learning method for non-linear >> kernels. >> >> Point taken. > >> >> > ... It is also known fact that, with a appropriate model selection, >> > nonlinear kernels give better classification accuracy then linear >> kernels. >> > >> >> Actually, not. I think that the situations where non-linear kernels are >> better are more limited than most suppose, particularly for large-scale >> applications. >> >> >> > Exactly at this point, we need online learning (SGS/AGSD based method), >> we >> > can still use nonlinear kernels, parallelize the algorithm and we can >> have >> > a online SVM method for large/web scale datasets. >> > >> >> Now this begins to sound right. >> >> Honestly I am so much into SVM and kernel machines and I fear that I am >> > making big fuss out of small problems. >> >> >> My key question is whether you have problems that need solving. Or do you >> have an itch to do an implementation for the sake of having the >> implementation? >> >> Either one is a reasonable motive, but the first is preferable. >> > > As I mentioned above, I am really interested in applying machine learning > algorithms to large/web scale data. This does not mean > that I am a fan of a particular algorithm. Therefore I started to this > discussion and titled it "What to Implement/Improve/Document?". From your > comments also from the comments of other people I started to learn a lot > (thanks a lot). I do not have an enough practical experience for large/web > scale data problems so I am open to comments from everyone. As you have > stated that "I see no need for batch learning unless there is a net > training benefit" then SGD/AGSD based algorithms should have a priority > over "batch learning" methods. I am looking forward to have more comments > and I hope we will have a decision on what to implement. > > Cheers > Ürün > >
