I have told him that he could use it, he uses a different approach. You told that we can later merge when he is ready. First come, first serve.
2012/7/10 Edward J. Yoon <[email protected]> > My concern is that this looks like duplicated efforts with Miklai. > > I think it needs to be organized. > > On Tue, Jul 10, 2012 at 8:26 PM, Thomas Jungblut > <[email protected]> wrote: > > Splitting out a math module would be smarter, but let's just keep that in > > the ML package. > > > > Anyone volunteer to code a simple (mini-) batch gradient descent in BSP? > > http://holehouse.org/mlclass/17_Large_Scale_Machine_Learning.html > > > > > > 2012/7/10 Edward J. Yoon <[email protected]> > > > >> would like to move core module so that other can reuse it. > >> > >> On Tue, Jul 10, 2012 at 7:13 PM, Tommaso Teofili > >> <[email protected]> wrote: > >> > I've done the first import, we can start from that now, thanks Thomas. > >> > Tommaso > >> > > >> > 2012/7/10 Tommaso Teofili <[email protected]> > >> > > >> >> ok, I'll try that, thanks :) > >> >> Tommaso > >> >> > >> >> 2012/7/10 Thomas Jungblut <[email protected]> > >> >> > >> >>> I don't know if we need sparse/named vectors for the first scratch. > >> >>> You can just use the interface and the dense implementations and > remove > >> >>> all > >> >>> the uncompilable code in the writables. > >> >>> > >> >>> 2012/7/10 Tommaso Teofili <[email protected]> > >> >>> > >> >>> > Thomas, while inspecting the code I realize I may need to import > >> >>> most/all > >> >>> > of the classes inside your math library for the writables to > compile, > >> >>> is it > >> >>> > ok for you or you don't want that? > >> >>> > Regards, > >> >>> > Tommaso > >> >>> > > >> >>> > 2012/7/10 Thomas Jungblut <[email protected]> > >> >>> > > >> >>> > > great, thank you for taking care of it ;) > >> >>> > > > >> >>> > > 2012/7/10 Tommaso Teofili <[email protected]> > >> >>> > > > >> >>> > > > Ok, sure, I'll just add the writables along with > >> DoubleMatrix/Vector > >> >>> > with > >> >>> > > > the AL2 headers on top. > >> >>> > > > Thanks Thomas for the contribution and feedback. > >> >>> > > > Tommaso > >> >>> > > > > >> >>> > > > 2012/7/10 Thomas Jungblut <[email protected]> > >> >>> > > > > >> >>> > > > > Feel free to commit this, but take care to add the apache > >> license > >> >>> > > > headers. > >> >>> > > > > Also I wanted to add a few testcases over the next few > >> weekends. > >> >>> > > > > > >> >>> > > > > 2012/7/10 Tommaso Teofili <[email protected]> > >> >>> > > > > > >> >>> > > > > > nice idea, quickly thinking to it it looks to me that > (C)GD > >> is a > >> >>> > good > >> >>> > > > fit > >> >>> > > > > > for BSP. > >> >>> > > > > > Also I was trying to implement some easy meta learning > >> algorithm > >> >>> > like > >> >>> > > > the > >> >>> > > > > > weighed majority algorithm where each peer as a proper > >> learning > >> >>> > > > algorithm > >> >>> > > > > > and gest penalized for each mistaken prediction. > >> >>> > > > > > Regarding your math library do you plan to commit it > >> yourself? > >> >>> > > > Otherwise > >> >>> > > > > I > >> >>> > > > > > can do it. > >> >>> > > > > > Regards, > >> >>> > > > > > Tommaso > >> >>> > > > > > > >> >>> > > > > > > >> >>> > > > > > 2012/7/10 Thomas Jungblut <[email protected]> > >> >>> > > > > > > >> >>> > > > > > > Maybe a first good step towards algorithms would be to > try > >> to > >> >>> > > > evaluate > >> >>> > > > > > how > >> >>> > > > > > > we can implement some non-linear optimizers in BSP. > (BFGS > >> or > >> >>> > > > conjugate > >> >>> > > > > > > gradient method) > >> >>> > > > > > > > >> >>> > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> > >> >>> > > > > > > > >> >>> > > > > > > > 2012/7/9 Thomas Jungblut <[email protected]> > >> >>> > > > > > > > > >> >>> > > > > > > > > For the matrix/vector I would propose my library > >> >>> interface: > >> >>> > > > (quite > >> >>> > > > > > like > >> >>> > > > > > > > > mahouts math, but without boundary checks) > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java > >> >>> > > > > > > > > Full Writable for Vector and basic Writable for > Matrix: > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> > https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable > >> >>> > > > > > > > > > >> >>> > > > > > > > > It is an enough to make all machine learning > algorithms > >> >>> I've > >> >>> > > seen > >> >>> > > > > > until > >> >>> > > > > > > > now > >> >>> > > > > > > > > and the builder pattern allows really nice chaining > of > >> >>> > commands > >> >>> > > > to > >> >>> > > > > > > easily > >> >>> > > > > > > > > code equations or translate code from matlab/octave. > >> >>> > > > > > > > > See for example logistic regression cost function > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> > https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java > >> >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > very nice, +1! > >> >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > For the interfaces of the algorithms: > >> >>> > > > > > > > > I guess we need to get some more experience, I can > not > >> >>> tell > >> >>> > how > >> >>> > > > the > >> >>> > > > > > > > > interfaces for them should look like, mainly > because I > >> >>> don't > >> >>> > > know > >> >>> > > > > how > >> >>> > > > > > > the > >> >>> > > > > > > > > BSP version of them will call the algorithm logic. > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > you're right, it's more reasonable to just proceed > >> bottom - > >> >>> up > >> >>> > > with > >> >>> > > > > > this > >> >>> > > > > > > as > >> >>> > > > > > > > we're going to have a clearer idea while developing > the > >> >>> > different > >> >>> > > > > > > > algorithms. > >> >>> > > > > > > > So for now I'd introduce your library Writables and > then > >> >>> > proceed > >> >>> > > 1 > >> >>> > > > > step > >> >>> > > > > > > at > >> >>> > > > > > > > a time with the more common API. > >> >>> > > > > > > > Thanks, > >> >>> > > > > > > > Tommaso > >> >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > But having stable math interfaces is the key point. > >> >>> > > > > > > > > > >> >>> > > > > > > > > 2012/7/9 Tommaso Teofili <[email protected] > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > > Ok, so let's sketch up here what these interfaces > >> should > >> >>> > look > >> >>> > > > > like. > >> >>> > > > > > > > > > Any proposal is more than welcome. > >> >>> > > > > > > > > > Regards, > >> >>> > > > > > > > > > Tommaso > >> >>> > > > > > > > > > > >> >>> > > > > > > > > > 2012/7/7 Thomas Jungblut < > [email protected]> > >> >>> > > > > > > > > > > >> >>> > > > > > > > > > > Looks fine to me. > >> >>> > > > > > > > > > > The key are the interfaces for learning and > >> >>> predicting so > >> >>> > > we > >> >>> > > > > > should > >> >>> > > > > > > > > > define > >> >>> > > > > > > > > > > some vectors and matrices. > >> >>> > > > > > > > > > > It would be enough to define the algorithms via > the > >> >>> > > > interfaces > >> >>> > > > > > and > >> >>> > > > > > > a > >> >>> > > > > > > > > > > generic BSP should just run them based on the > given > >> >>> > input. > >> >>> > > > > > > > > > > > >> >>> > > > > > > > > > > 2012/7/7 Tommaso Teofili < > >> [email protected]> > >> >>> > > > > > > > > > > > >> >>> > > > > > > > > > > > Hi all, > >> >>> > > > > > > > > > > > > >> >>> > > > > > > > > > > > in my spare time I started writing some basic > BSP > >> >>> based > >> >>> > > > > machine > >> >>> > > > > > > > > > learning > >> >>> > > > > > > > > > > > algorithms for our ml module, now I'm > wondering, > >> >>> from a > >> >>> > > > > design > >> >>> > > > > > > > point > >> >>> > > > > > > > > of > >> >>> > > > > > > > > > > > view, where it'd make sense to put the > training > >> >>> data / > >> >>> > > > model. > >> >>> > > > > > I'd > >> >>> > > > > > > > > > assume > >> >>> > > > > > > > > > > > the obvious answer would be HDFS so this > makes me > >> >>> think > >> >>> > > we > >> >>> > > > > > should > >> >>> > > > > > > > > come > >> >>> > > > > > > > > > > with > >> >>> > > > > > > > > > > > (at least) two BSP jobs for each algorithm: > one > >> for > >> >>> > > > learning > >> >>> > > > > > and > >> >>> > > > > > > > one > >> >>> > > > > > > > > > for > >> >>> > > > > > > > > > > > "predicting" each to be run separately. > >> >>> > > > > > > > > > > > This would allow to read the training data > from > >> >>> HDFS, > >> >>> > and > >> >>> > > > > > > > > consequently > >> >>> > > > > > > > > > > > create a model (also on HDFS) and then the > >> created > >> >>> > model > >> >>> > > > > could > >> >>> > > > > > be > >> >>> > > > > > > > > read > >> >>> > > > > > > > > > > > (again from HDFS) in order to predict an > output > >> for > >> >>> a > >> >>> > new > >> >>> > > > > > input. > >> >>> > > > > > > > > > > > Does that make sense? > >> >>> > > > > > > > > > > > I'm just wondering what a general purpose > design > >> for > >> >>> > Hama > >> >>> > > > > based > >> >>> > > > > > > ML > >> >>> > > > > > > > > > stuff > >> >>> > > > > > > > > > > > would look like so this is just to start the > >> >>> > discussion, > >> >>> > > > any > >> >>> > > > > > > > opinion > >> >>> > > > > > > > > is > >> >>> > > > > > > > > > > > welcome. > >> >>> > > > > > > > > > > > > >> >>> > > > > > > > > > > > Cheers, > >> >>> > > > > > > > > > > > Tommaso > >> >>> > > > > > > > > > > > > >> >>> > > > > > > > > > > > >> >>> > > > > > > > > > > >> >>> > > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >> > >> >> > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
