Cool stuff! In my past experience dealing with user-click histories I have tried to model out the clicked-items as a large connected graph. The strength of the connection between any two items is determined by the number of times they co-occurred across all user's click histories.
The idea is simple and works quite well when processing large datasets on a medium sized cluster. I also have a patch uploaded to Mahout http://issues.apache.org/jira/browse/MAHOUT-103 though I haven't had enough time to get it into a commitable shape. Regards -Ankur ----- Original Message ----- From: "Michal Laclavik" <[email protected]> To: [email protected] Sent: Thursday, July 2, 2009 4:51:36 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: FYI, Large-scale graph computing at Google very interesting discussion ... we are dealing with processing social networks from email communication with connection to other objects extracted from the email. Extraction of the network works very fine on hadoop, but processing of the graph it it is not that easy. We would like to implement spread activation algorithm over MR but it is quite dificult. Anyone tried something with spread activation on Hadoop? Michal On Thu, Jul 2, 2009 at 12:36 PM, Edward J. Yoon<[email protected]> wrote: > Thanks. BTW this link seems broken. Could you send me a paper? ;) > > And, We've just begun to design the Hamburg -- > http://wiki.apache.org/hadoop/Hamburg -- any comments are welcome. > > On Wed, Jul 1, 2009 at 1:38 AM, Delip Rao<[email protected]> wrote: >> We've had some success in dealing with locality problems using the adjacency >> list >> representation. This could be serialized using frameworks like Thrift >> or Protocol Buffers. >> For details, please see: >> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf >> >> I intend to >> continue this line of work and will be very happy to be of any help. >> >> On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <[email protected]>wrote: >> >>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug -- >>> Let's discuss about the graph computing framework named Hambrug. >>> >>> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<[email protected]> >>> wrote: >>> > To be honest, I was thought the BigTable (HBase) for the map/reduce >>> > based graph/matrix operations. The main problems of performance were >>> > the sequential algorithm, the cost for MR job building in iterations. >>> > and, the locality of adjacent components. As mentioned on Pregel, If >>> > some algorithm requires small resources to get result, the BSP model >>> > based another computing framework on HDFS can be useful for us. >>> > >>> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<[email protected]> >>> wrote: >>> >> I've been working on some graph stuff using MR as well. I'd be more than >>> >> interested to chip in as well.. >>> >> >>> >> I remember exchanging a few mails with Paolo about having an RDF store >>> over >>> >> HBase and developing graph algorithms over it. >>> >> >>> >> >>> >> Amandeep Khurana >>> >> Computer Science Graduate Student >>> >> University of California, Santa Cruz >>> >> >>> >> >>> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <[email protected]> >>> wrote: >>> >> >>> >>> Edward J. Yoon wrote: >>> >>> >>> >>>> What do you think about another new computation framework on HDFS? >>> >>>> >>> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon < >>> [email protected]> >>> >>>> wrote: >>> >>>> >>> >>>>> >>> >>>>> >>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html >>> >>>>> -- It sounds like Pregel seems, a computing framework based on >>> dynamic >>> >>>>> programming for the graph operations. I guess maybe they removed the >>> >>>>> file communications/intermediate files during iterations. >>> >>>>> >>> >>>>> Anyway, What do you think? >>> >>>>> >>> >>>> >>> >>> I have a colleague (paolo) who would be interested in adding a set of >>> graph >>> >>> algorithms on top of the MR engine >>> >>> >>> >> >>> > >>> > >>> > >>> > -- >>> > Best Regards, Edward J. Yoon @ NHN, corp. >>> > [email protected] >>> > http://blog.udanax.org >>> > >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon @ NHN, corp. >>> [email protected] >>> http://blog.udanax.org >>> >> > > > > -- > Best Regards, Edward J. Yoon @ NHN, corp. > [email protected] > http://blog.udanax.org > -- S pozdravom Michal Laclavik == Institute of Informatics SAS email: [email protected] web: http://laclavik.net/
