Hi,

I see. Thanks for your explanation. I thought that every thing in Mahout
should be parallelized.

I agree with Ted, to extend k may not obtain any improvement, especially,
within the large cluster case. *The lager-scale learning, however, at least
has two levels, one is for algorithm and another is for data storage or
caching.* With hadoop, users can store a large scale dataset in cluster or
even load the dataset to memory, then they could perform a training process
using the sequential implementation of Pegasos.

Whether my understanding is correct?

Cheers,
Zhendong

On Tue, Dec 22, 2009 at 1:55 PM, Jake Mannix <jake.man...@gmail.com> wrote:

> Zhao,
>
>  Mahout is not just for hadoop-based implementations.  We are interested in
> "scalable
> machine learning" - we currently have *no* SVM implementations in Mahout,
> and would
> welcome an easy simple straightforward SVM, and would find something like
> the original
> Pegasos implemented in our APIs also an excellent addition.
>
>  If at some point we added a fully parallelized hadoop-based Pegasos, that
> would be
> great, sure, but we don't require everything contributed to Mahout to run
> on
> Hadoop.
> Currently quite a bit of our libraries have nothing parallel about them
> yet,
> but they are
> all aimed to be able to scale to large data sets.
>
>  Does this make sense?
>
>  -jake
>
> On Mon, Dec 21, 2009 at 9:21 PM, zhao zhendong <zhaozhend...@gmail.com
> >wrote:
>
> > {quote}
> > k = 1
> > Otherwise as in the Pegasos article.  No parallelism.
> > {quote}
> >
> > I confused. As the consequence, what is the motivation behind integrating
> > the Pegasos into Mahout.
> >
> > Can you estimate that in which situation, this implementation can
> > outperform
> > the original Pegasos? Large-scale data set or any other concern?
> >
> > With this implementation, how can we take advantage of Map-reduce
> > framework?
> >
> >
> > On Tue, Dec 22, 2009 at 12:44 PM, Ted Dunning (JIRA) <j...@apache.org
> > >wrote:
> >
> > >
> > >    [
> > >
> >
> https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793497#action_12793497
> > ]
> > >
> > > Ted Dunning commented on MAHOUT-227:
> > > ------------------------------------
> > >
> > > {quote}
> > > Can you specify this sequential implementation?
> > > {quote}
> > >
> > > k = 1
> > >
> > > Otherwise as in the Pegasos article.
> > >
> > >
> > > > Parallel SVM
> > > > ------------
> > > >
> > > >                 Key: MAHOUT-227
> > > >                 URL:
> https://issues.apache.org/jira/browse/MAHOUT-227
> > > >             Project: Mahout
> > > >          Issue Type: Task
> > > >          Components: Classification
> > > >            Reporter: zhao zhendong
> > > >         Attachments: ParallelPegasos.doc, ParallelPegasos.pdf
> > > >
> > > >
> > > > I wrote a proposal of parallel algorithm for SVM training. Any
> comment
> > is
> > > welcome.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > >
> > >
> >
> >
> > --
> > -------------------------------------------------------------
> >
> > Zhen-Dong Zhao (Maxim)
> >
> > <><<><><><><><><><>><><><><><>>>>>>
> >
> > Department of Computer Science
> > School of Computing
> > National University of Singapore
> >
> > ><><><><><><><><><><><><><><><><<<<
> > Homepage:http://zhaozhendong.googlepages.com
> > Mail: zhaozhend...@gmail.com
> > >>>>>>><><><><><><><><<><>><><<<<<<
> >
>



-- 
-------------------------------------------------------------

Zhen-Dong Zhao (Maxim)

Reply via email to