Yes, we can express it with the superstep API very easy. So if you're interested in my neural net, please follow: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/classification/nn/BatchBackpropagationBSP.java
I will start once I have a bigger chunk of time ;) 2012/6/14 Suraj Menon <[email protected]> > Just adding my 2 cents. Thomas, this goes in line with the discussion we > had recently on how Hama should have a superstep library, where each > superstep does something that potential user (In this case, our machine > learning library) can override and use. Few ideas for superstep library: > > 1. RealTimeSuperstep (extends Superstep but does not sync) > 2. MutualBroadcastSuperstep (extends Superstep; used where all the peers > have to send all their messages to each other. We should employ a peer > communication strategy such that every peer internally does not have to > open RPC connection with every other peer) > 3. Mapper and Reducer(I have one WordCount test running for small set of > data. Will need more time to increase its scalability, the first step of > MapReduce would have to use MutualBroadCast. > 4. OutputCommitter (a Superstep that would write output records to HDFS not > based on the peer ID) > 5. IterativeSuperstep (that holds static information on every iteration and > checkpoints them) > 6.. more expected as we work on new ideas. > > > -Suraj > > > On Thu, Jun 14, 2012 at 2:45 PM, Thomas Jungblut < > [email protected]> wrote: > > > I have read a bit about batch neural networks and I think I have found a > > viable solution for BSP. > > The funny thing is, that it is the same intuition that my kmeans > clustering > > has. > > > > Each task is processing on a local block of the data, training a full > model > > for itself (making a forward pass and calculating the error of the output > > neurons against the prediction). > > Now after you have iterated over all the observations, you are going to > > send all the weights of your neurons and the error (let's say the average > > error over all observations) to all the other tasks. > > After sync, each tasks has #tasks weights for a neuron and the avg > > prediction error, now the weights are accumulated and the backward step > > with the error begins. > > When all weights are backpropagated on each task, you can start reading > the > > whole observations again and make the next epoch. (until some minimum > > average error has been seen or maximum epochs has been reached). > > > > Don't know if that is a common pattern in machine learning, but seems to > me > > like we can extract some kind of API that helps building local models and > > combining them again in the next superstep with more information (think > of > > the Pregel API with compute, but not on vertex level but on task level). > > > > What do you think about that? > > > > 2012/6/14 Thomas Jungblut <[email protected]> > > > > > Very cool project, I just need a few vectors and matrices where I will > > use > > > my own library first. > > > > > > Still having a hard time to distribute the network and update it > > > accordingly in backprop. If you have smart ideas, let me know. > > > > > > > > > 2012/6/14 Tommaso Teofili <[email protected]> > > > > > >> Hi Thomas, > > >> regarding neural networks I'm also working on it within Apache Yay (my > > >> Apache labs project [1]) and I agree it'd make sense to run neural > > network > > >> algorithms on top of Hama, however at this stage I've just a prototype > > in > > >> memory implementation for feedforward (no actual learning) neural > > >> networks. > > >> Apart from that I think we need a math/linear algebra package running > on > > >> top of Hama to make those algorithms scale nicely. > > >> I agree we can start from batch and then switch to online machine > > learning > > >> algorithms. > > >> Regards, > > >> Tommaso > > >> > > >> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/ > > >> > > >> 2012/6/13 Thomas Jungblut <[email protected]> > > >> > > >> > I'm going to focus still on batch learning, my next aim would be to > > try > > >> out > > >> > neural networks with BSP. > > >> > > > >> > > > >> > > > >> > > > http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414 > > >> > > > >> > http://techreports.cs.queensu.ca/files/1997-406.pdf > > >> > > > >> > Along with the pSVM we have then two strong learners. If you're > > >> interested, > > >> > pass me a private message. But I have to write a few exams next week > > so > > >> I'm > > >> > busy and this is just an idea, we'll see how fast I can get a > > prototye. > > >> > > > >> > Real time is difficult at the moment, we need the out of sync > > messaging. > > >> > > > >> > 2012/6/13 Edward J. Yoon <[email protected]> > > >> > > > >> > > Thank you for your sharing! > > >> > > > > >> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili > > >> > > <[email protected]> wrote: > > >> > > > following up with this discussion on our dev list, I found an > > >> > > introductory > > >> > > > pdf to online ML which may be useful [1] > > >> > > > Apart fromt that we can start by creating the module structure > in > > >> hama > > >> > > svn > > >> > > > (still the incubator one as the TLP move seems to take a while). > > >> > > > Regards, > > >> > > > Tommaso > > >> > > > > > >> > > > [1] : > > >> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf > > >> > > > > > >> > > > 2012/5/25 Edward J. Yoon <[email protected]> > > >> > > > > > >> > > >> I'm roughly thinking to create new module so that I can add 3rd > > >> party > > >> > > >> dependencies easily. > > >> > > >> > > >> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili > > >> > > >> <[email protected]> wrote: > > >> > > >> > Do you have a plan for that Edward? > > >> > > >> > A separate package in examples or a separate (online) machine > > >> > learning > > >> > > >> > module? Or something else? > > >> > > >> > Regards > > >> > > >> > Tommaso > > >> > > >> > > > >> > > >> > 2012/5/25 Edward J. Yoon <[email protected]> > > >> > > >> > > > >> > > >> >> OKay, then let's get started. > > >> > > >> >> > > >> > > >> >> My first idea is simple online recommendation system based > on > > >> > > >> click-stream > > >> > > >> >> data. > > >> > > >> >> > > >> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > > >> > > >> >> <[email protected]> wrote: > > >> > > >> >> > +1 > > >> > > >> >> > > > >> > > >> >> > For those who are interested in ML, please check this. GNU > > >> Octave > > >> > > is > > >> > > >> >> used. > > >> > > >> >> > > > >> > > >> >> > https://www.coursera.org/course/ml > > >> > > >> >> > > > >> > > >> >> > Another session is yet to be announced. > > >> > > >> >> > > > >> > > >> >> > Thanks, > > >> > > >> >> > Praveen > > >> > > >> >> > > > >> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > > >> > > >> >> > [email protected]> wrote: > > >> > > >> >> > > > >> > > >> >> >> +1 > > >> > > >> >> >> > > >> > > >> >> >> 2012/5/24 Tommaso Teofili <[email protected]> > > >> > > >> >> >> > > >> > > >> >> >> > and same here :) > > >> > > >> >> >> > > > >> > > >> >> >> > 2012/5/24 Vaijanath Rao <[email protected]> > > >> > > >> >> >> > > > >> > > >> >> >> > > +1 me too > > >> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > > >> > > >> >> [email protected]> > > >> > > >> >> >> > > wrote: > > >> > > >> >> >> > > > > >> > > >> >> >> > > > +1 > > >> > > >> >> >> > > > I would be happy to help :) > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < > > >> > > >> >> >> [email protected] > > >> > > >> >> >> > > > >wrote: > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > > Hi, > > >> > > >> >> >> > > > > > > >> > > >> >> >> > > > > Does anyone interesting in online machine > learning? > > >> > > >> >> >> > > > > > > >> > > >> >> >> > > > > -- > > >> > > >> >> >> > > > > Best Regards, Edward J. Yoon > > >> > > >> >> >> > > > > @eddieyoon > > >> > > >> >> >> > > > > > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > -- > > >> > > >> >> >> > > > Cheers, > > >> > > >> >> >> > > > Aditya Sarawgi > > >> > > >> >> >> > > > > > >> > > >> >> >> > > > > >> > > >> >> >> > > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > >> >> >> -- > > >> > > >> >> >> Thomas Jungblut > > >> > > >> >> >> Berlin <[email protected]> > > >> > > >> >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> -- > > >> > > >> >> Best Regards, Edward J. Yoon > > >> > > >> >> @eddieyoon > > >> > > >> >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> > > >> Best Regards, Edward J. Yoon > > >> > > >> @eddieyoon > > >> > > >> > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > Best Regards, Edward J. Yoon > > >> > > @eddieyoon > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Thomas Jungblut > > >> > Berlin <[email protected]> > > >> > > > >> > > > > > > > > > > > > -- > > > Thomas Jungblut > > > Berlin <[email protected]> > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin <[email protected]> > > > -- Thomas Jungblut Berlin <[email protected]>
