Hi Manuel, 300k is small, I have one with 6 mio clicks. However it is more a question of interest and what algorithms could be suitable for BSP. In case you wonder what BSP is, it stands for bulk synchronous parallel [1]. We think that realtime and strongly iterative algorithms that are slow in mapreduce could be more efficiently solved with BSP. If you're interested, let us know.
Regards, Thomas [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel 2012/5/25 Manuel Blechschmidt <[email protected]> > Hi Edward, > do you already have a test dataset? > > I might get one with about 300.000 clicks for you. > > It is from www.nelou.com and we are already running a recommender in > preview mode: > http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode > > It could be the case that you would have to sign an NDA. Would this be > possible for you? > > /Manuel > > On 25.05.2012, at 10:34, Edward J. Yoon wrote: > > > OKay, I'm FWD this to mahout dev. > > > > I'm planning to create a project related to On-line machine learning, > > as a Apache Hama sub-module. Since the graph of message queues and > > workers could be implemented using BSP (see also [1]). The first idea > > is On-line recommendation system based on click-stream data. > > > > If you have interested in this plan, let's talk together here. > > > > 1. > http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > > > ---------- Forwarded message ---------- > > From: Thomas Jungblut <[email protected]> > > Date: Fri, May 25, 2012 at 4:55 PM > > Subject: Re: Online machine learning on top of Hama BSP > > To: [email protected] > > > > > > Should we cooperate with the Mahout guys on this? I'm pretty sure they > > would have fun with it. > > Edward, do you want to ask them? > > > > 2012/5/25 Tommaso Teofili <[email protected]> > > > >> Do you have a plan for that Edward? > >> A separate package in examples or a separate (online) machine learning > >> module? Or something else? > >> Regards > >> Tommaso > >> > >> 2012/5/25 Edward J. Yoon <[email protected]> > >> > >>> OKay, then let's get started. > >>> > >>> My first idea is simple online recommendation system based on > >> click-stream > >>> data. > >>> > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > >>> <[email protected]> wrote: > >>>> +1 > >>>> > >>>> For those who are interested in ML, please check this. GNU Octave is > >>> used. > >>>> > >>>> https://www.coursera.org/course/ml > >>>> > >>>> Another session is yet to be announced. > >>>> > >>>> Thanks, > >>>> Praveen > >>>> > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > >>>> [email protected]> wrote: > >>>> > >>>>> +1 > >>>>> > >>>>> 2012/5/24 Tommaso Teofili <[email protected]> > >>>>> > >>>>>> and same here :) > >>>>>> > >>>>>> 2012/5/24 Vaijanath Rao <[email protected]> > >>>>>> > >>>>>>> +1 me too > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > >>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> +1 > >>>>>>>> I would be happy to help :) > >>>>>>>> > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < > >>>>> [email protected] > >>>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> Does anyone interesting in online machine learning? > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best Regards, Edward J. Yoon > >>>>>>>>> @eddieyoon > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Cheers, > >>>>>>>> Aditya Sarawgi > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Thomas Jungblut > >>>>> Berlin <[email protected]> > >>>>> > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> @eddieyoon > >>> > >> > > > > > > > > -- > > Thomas Jungblut > > Berlin <[email protected]> > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B > > -- Thomas Jungblut Berlin <[email protected]>
