The cluster will be one component at search engine. Other components will be clasifier, crawler and indexer. I have idea about architecture in which all components will be run at each machine. Weba pages will be sent to cpu-s by hash function, which will be variable depending on inserting new or disposing or damaging working cpu-s. Between the crawler and the other part of system will be queue, from which will be scheduled pages by hash.
My idea for clustering would be making relevance by properties, like repetition keywods on page, relevant tags, keyword in subject etc. For each property will be allocated one axis and from n-dimensional space clustering machine will group pages by proper algrithm, in my case k-Means. If you want I will be able to describe detailed relevance for clustering with proper examples tomorrow. Greetings --- Isabel Drost <[EMAIL PROTECTED]> wrote: > On Monday 24 March 2008, Marko Novakovic wrote: > > and I am interesting to implement this clustering > > algorithm at Handop platform. > > So you would like to get a distributed clustering > algorithm for grouping > search results? It would be nice to hear more about > your approach to this > problem. > > There are a few guys here who have been working on > clustering search results > already. I guess they might be able to provide some > help as well. > > We already have a k-Means implementation, but so far > it has not been > integrated into a search result clustering context. > > Isabel > > -- > Science is what happens when preconception meets > verification. > |\ _,,,---,,_ Web: > <http://www.isabel-drost.de> > /,`.-'`' -. ;-;;,_ > |,4- ) )-,_..;\ ( `'-' > '---''(_/--' `-'\_) (fL) IM: > <xmpp://[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com