Re: Google Summer of Code

Marko Novakovic Mon, 24 Mar 2008 16:45:07 -0700

The cluster will be one component at search engine.
Other components will be clasifier, crawler and
indexer. I have idea about architecture in which all
components will be run at each machine.
Weba pages will be sent to cpu-s by hash function,
which will be variable depending on inserting new or
disposing or damaging working cpu-s.
Between the crawler and the other part of system will
be queue, from which will be scheduled pages by hash.

My idea for clustering would be making relevance by
properties, like repetition keywods on page, relevant
tags, keyword in subject etc. For each property will
be allocated one axis and from n-dimensional space
clustering machine will group pages by proper
algrithm, in my case k-Means.
If you want I will be able to describe detailed
relevance for clustering with proper examples
tomorrow.

Greetings

--- Isabel Drost <[EMAIL PROTECTED]>
wrote:

> On Monday 24 March 2008, Marko Novakovic wrote:
> > and I am interesting to implement this clustering
> > algorithm at Handop platform.
> 
> So you would like to get a distributed clustering
> algorithm for grouping 
> search results? It would be nice to hear more about
> your approach to this 
> problem. 
> 
> There are a few guys here who have been working on
> clustering search results 
> already. I guess they might be able to provide some
> help as well.
> 
> We already have a k-Means implementation, but so far
> it has not been 
> integrated into a search result clustering context.
> 
> Isabel
> 
> -- 
> Science is what happens when preconception meets
> verification.
>   |\      _,,,---,,_       Web:  
> <http://www.isabel-drost.de>
>   /,`.-'`'    -.  ;-;;,_
>  |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM: 
> <xmpp://[EMAIL PROTECTED]>
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Google Summer of Code

Reply via email to