Hi, I see. Thanks for your explanation. I thought that every thing in Mahout should be parallelized.
I agree with Ted, to extend k may not obtain any improvement, especially, within the large cluster case. *The lager-scale learning, however, at least has two levels, one is for algorithm and another is for data storage or caching.* With hadoop, users can store a large scale dataset in cluster or even load the dataset to memory, then they could perform a training process using the sequential implementation of Pegasos. Whether my understanding is correct? Cheers, Zhendong On Tue, Dec 22, 2009 at 1:55 PM, Jake Mannix <jake.man...@gmail.com> wrote: > Zhao, > > Mahout is not just for hadoop-based implementations. We are interested in > "scalable > machine learning" - we currently have *no* SVM implementations in Mahout, > and would > welcome an easy simple straightforward SVM, and would find something like > the original > Pegasos implemented in our APIs also an excellent addition. > > If at some point we added a fully parallelized hadoop-based Pegasos, that > would be > great, sure, but we don't require everything contributed to Mahout to run > on > Hadoop. > Currently quite a bit of our libraries have nothing parallel about them > yet, > but they are > all aimed to be able to scale to large data sets. > > Does this make sense? > > -jake > > On Mon, Dec 21, 2009 at 9:21 PM, zhao zhendong <zhaozhend...@gmail.com > >wrote: > > > {quote} > > k = 1 > > Otherwise as in the Pegasos article. No parallelism. > > {quote} > > > > I confused. As the consequence, what is the motivation behind integrating > > the Pegasos into Mahout. > > > > Can you estimate that in which situation, this implementation can > > outperform > > the original Pegasos? Large-scale data set or any other concern? > > > > With this implementation, how can we take advantage of Map-reduce > > framework? > > > > > > On Tue, Dec 22, 2009 at 12:44 PM, Ted Dunning (JIRA) <j...@apache.org > > >wrote: > > > > > > > > [ > > > > > > https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793497#action_12793497 > > ] > > > > > > Ted Dunning commented on MAHOUT-227: > > > ------------------------------------ > > > > > > {quote} > > > Can you specify this sequential implementation? > > > {quote} > > > > > > k = 1 > > > > > > Otherwise as in the Pegasos article. > > > > > > > > > > Parallel SVM > > > > ------------ > > > > > > > > Key: MAHOUT-227 > > > > URL: > https://issues.apache.org/jira/browse/MAHOUT-227 > > > > Project: Mahout > > > > Issue Type: Task > > > > Components: Classification > > > > Reporter: zhao zhendong > > > > Attachments: ParallelPegasos.doc, ParallelPegasos.pdf > > > > > > > > > > > > I wrote a proposal of parallel algorithm for SVM training. Any > comment > > is > > > welcome. > > > > > > -- > > > This message is automatically generated by JIRA. > > > - > > > You can reply to this email to add a comment to the issue online. > > > > > > > > > > > > -- > > ------------------------------------------------------------- > > > > Zhen-Dong Zhao (Maxim) > > > > <><<><><><><><><><>><><><><><>>>>>> > > > > Department of Computer Science > > School of Computing > > National University of Singapore > > > > ><><><><><><><><><><><><><><><><<<< > > Homepage:http://zhaozhendong.googlepages.com > > Mail: zhaozhend...@gmail.com > > >>>>>>><><><><><><><><<><>><><<<<<< > > > -- ------------------------------------------------------------- Zhen-Dong Zhao (Maxim)