Re: Online machine learning on top of Hama BSP

Thomas Jungblut Fri, 15 Jun 2012 11:47:22 -0700

Yes, we can express it with the superstep API very easy.

So if you're interested in my neural net, please follow:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/classification/nn/BatchBackpropagationBSP.java



I will start once I have a bigger chunk of time ;)

2012/6/14 Suraj Menon <[email protected]>

> Just adding my 2 cents. Thomas, this goes in line with the discussion we
> had recently on how Hama should have a superstep library, where each
> superstep does something that potential user (In this case, our machine
> learning library) can override and use. Few ideas for superstep library:
>
> 1. RealTimeSuperstep (extends Superstep but does not sync)
> 2. MutualBroadcastSuperstep (extends Superstep; used where all the peers
> have to send all their messages to each other. We should employ a peer
> communication strategy such that every peer internally does not have to
> open RPC connection with every other peer)
> 3. Mapper and Reducer(I have one WordCount test running for small set of
> data. Will need more time to increase its scalability, the first step of
> MapReduce would have to use MutualBroadCast.
> 4. OutputCommitter (a Superstep that would write output records to HDFS not
> based on the peer ID)
> 5. IterativeSuperstep (that holds static information on every iteration and
> checkpoints them)
> 6.. more expected as we work on new ideas.
>
>
> -Suraj
>
>
> On Thu, Jun 14, 2012 at 2:45 PM, Thomas Jungblut <
> [email protected]> wrote:
>
> > I have read a bit about batch neural networks and I think I have found a
> > viable solution for BSP.
> > The funny thing is, that it is the same intuition that my kmeans
> clustering
> > has.
> >
> > Each task is processing on a local block of the data, training a full
> model
> > for itself (making a forward pass and calculating the error of the output
> > neurons against the prediction).
> > Now after you have iterated over all the observations, you are going to
> > send all the weights of your neurons and the error (let's say the average
> > error over all observations) to all the other tasks.
> > After sync, each tasks has #tasks weights for a neuron and the avg
> > prediction error, now the weights are accumulated and the backward step
> > with the error begins.
> > When all weights are backpropagated on each task, you can start reading
> the
> > whole observations again and make the next epoch. (until some minimum
> > average error has been seen or maximum epochs has been reached).
> >
> > Don't know if that is a common pattern in machine learning, but seems to
> me
> > like we can extract some kind of API that helps building local models and
> > combining them again in the next superstep with more information (think
> of
> > the Pregel API with compute, but not on vertex level but on task level).
> >
> > What do you think about that?
> >
> > 2012/6/14 Thomas Jungblut <[email protected]>
> >
> > > Very cool project, I just need a few vectors and matrices where I will
> > use
> > > my own library first.
> > >
> > > Still having a hard time to distribute the network and update it
> > > accordingly in backprop. If you have smart ideas, let me know.
> > >
> > >
> > > 2012/6/14 Tommaso Teofili <[email protected]>
> > >
> > >> Hi Thomas,
> > >> regarding neural networks I'm also working on it within Apache Yay (my
> > >> Apache labs project [1]) and I agree it'd make sense to run neural
> > network
> > >> algorithms on top of Hama, however at this stage I've just a prototype
> > in
> > >> memory implementation for feedforward (no actual learning) neural
> > >> networks.
> > >> Apart from that I think we need a math/linear algebra package running
> on
> > >> top of Hama to make those algorithms scale nicely.
> > >> I agree we can start from batch and then switch to online machine
> > learning
> > >> algorithms.
> > >> Regards,
> > >> Tommaso
> > >>
> > >> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
> > >>
> > >> 2012/6/13 Thomas Jungblut <[email protected]>
> > >>
> > >> > I'm going to focus still on batch learning, my next aim would be to
> > try
> > >> out
> > >> > neural networks with BSP.
> > >> >
> > >> >
> > >> >
> > >>
> >
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
> > >> >
> > >> > http://techreports.cs.queensu.ca/files/1997-406.pdf
> > >> >
> > >> > Along with the pSVM we have then two strong learners. If you're
> > >> interested,
> > >> > pass me a private message. But I have to write a few exams next week
> > so
> > >> I'm
> > >> > busy and this is just an idea, we'll see how fast I can get a
> > prototye.
> > >> >
> > >> > Real time is difficult at the moment, we need the out of sync
> > messaging.
> > >> >
> > >> > 2012/6/13 Edward J. Yoon <[email protected]>
> > >> >
> > >> > > Thank you for your sharing!
> > >> > >
> > >> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> > >> > > <[email protected]> wrote:
> > >> > > > following up with this discussion on our dev list, I found an
> > >> > > introductory
> > >> > > > pdf to online ML which may be useful [1]
> > >> > > > Apart fromt that we can start by creating the module structure
> in
> > >> hama
> > >> > > svn
> > >> > > > (still the incubator one as the TLP move seems to take a while).
> > >> > > > Regards,
> > >> > > > Tommaso
> > >> > > >
> > >> > > > [1] :
> > >> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> > >> > > >
> > >> > > > 2012/5/25 Edward J. Yoon <[email protected]>
> > >> > > >
> > >> > > >> I'm roughly thinking to create new module so that I can add 3rd
> > >> party
> > >> > > >> dependencies easily.
> > >> > > >>
> > >> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> > >> > > >> <[email protected]> wrote:
> > >> > > >> > Do you have a plan for that Edward?
> > >> > > >> > A separate package in examples or a separate (online) machine
> > >> > learning
> > >> > > >> > module? Or something else?
> > >> > > >> > Regards
> > >> > > >> > Tommaso
> > >> > > >> >
> > >> > > >> > 2012/5/25 Edward J. Yoon <[email protected]>
> > >> > > >> >
> > >> > > >> >> OKay, then let's get started.
> > >> > > >> >>
> > >> > > >> >> My first idea is simple online recommendation system based
> on
> > >> > > >> click-stream
> > >> > > >> >> data.
> > >> > > >> >>
> > >> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > >> > > >> >> <[email protected]> wrote:
> > >> > > >> >> > +1
> > >> > > >> >> >
> > >> > > >> >> > For those who are interested in ML, please check this. GNU
> > >> Octave
> > >> > > is
> > >> > > >> >> used.
> > >> > > >> >> >
> > >> > > >> >> > https://www.coursera.org/course/ml
> > >> > > >> >> >
> > >> > > >> >> > Another session is yet to be announced.
> > >> > > >> >> >
> > >> > > >> >> > Thanks,
> > >> > > >> >> > Praveen
> > >> > > >> >> >
> > >> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > >> > > >> >> > [email protected]> wrote:
> > >> > > >> >> >
> > >> > > >> >> >> +1
> > >> > > >> >> >>
> > >> > > >> >> >> 2012/5/24 Tommaso Teofili <[email protected]>
> > >> > > >> >> >>
> > >> > > >> >> >> > and same here :)
> > >> > > >> >> >> >
> > >> > > >> >> >> > 2012/5/24 Vaijanath Rao <[email protected]>
> > >> > > >> >> >> >
> > >> > > >> >> >> > > +1 me too
> > >> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > >> > > >> >> [email protected]>
> > >> > > >> >> >> > > wrote:
> > >> > > >> >> >> > >
> > >> > > >> >> >> > > > +1
> > >> > > >> >> >> > > > I would be happy to help :)
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >> > > >> >> >> [email protected]
> > >> > > >> >> >> > > > >wrote:
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > > Hi,
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > > > Does anyone interesting in online machine
> learning?
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > > > --
> > >> > > >> >> >> > > > > Best Regards, Edward J. Yoon
> > >> > > >> >> >> > > > > @eddieyoon
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > --
> > >> > > >> >> >> > > > Cheers,
> > >> > > >> >> >> > > > Aditya Sarawgi
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > >
> > >> > > >> >> >> >
> > >> > > >> >> >>
> > >> > > >> >> >>
> > >> > > >> >> >>
> > >> > > >> >> >> --
> > >> > > >> >> >> Thomas Jungblut
> > >> > > >> >> >> Berlin <[email protected]>
> > >> > > >> >> >>
> > >> > > >> >>
> > >> > > >> >>
> > >> > > >> >>
> > >> > > >> >> --
> > >> > > >> >> Best Regards, Edward J. Yoon
> > >> > > >> >> @eddieyoon
> > >> > > >> >>
> > >> > > >>
> > >> > > >>
> > >> > > >>
> > >> > > >> --
> > >> > > >> Best Regards, Edward J. Yoon
> > >> > > >> @eddieyoon
> > >> > > >>
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best Regards, Edward J. Yoon
> > >> > > @eddieyoon
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thomas Jungblut
> > >> > Berlin <[email protected]>
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Thomas Jungblut
> > > Berlin <[email protected]>
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <[email protected]>
> >
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Online machine learning on top of Hama BSP

Reply via email to