Re: Google Summer of Code

Isabel Drost Mon, 24 Mar 2008 23:47:33 -0700

On Tuesday 25 March 2008, Marko Novakovic wrote:
> Other components will be clasifier, crawler and
> indexer.


So it will be the typical setup: Crawl web pages, classify them as positive or 
negative and in the end index them correctly? I would be especially 
interested in how the classifier will be build - as far as you can share any 
such knowledge on a public mailing list before September '08.


> I have idea about architecture in which all 
> components will be run at each machine.

I think the system architecture was pretty clear from the slides you sent. I 
would be nice if you could briefly sketch them on list as the slides have not 
survived being sent to a mailing list :)


> My idea for clustering would be making relevance by
> properties, like repetition keywods on page, relevant
> tags, keyword in subject etc. For each property will
> be allocated one axis and from n-dimensional space
> clustering machine will group pages by proper
> algrithm, in my case k-Means.

If I understood the task correctly the goal is to build a system that is 
capable of separating posts that express some opinion from objective ones and 
afterwards to group positive vs. negative postings, right?

I do not yet see, how the clustering algorithm k-means helps you achieve this 
task.


> If you want I will be able to describe detailed
> relevance for clustering with proper examples
> tomorrow.

Sounds good.

Isabel


-- 
"Life sucks, but death doesn't put out at all...."              -- Thomas J. 
Kopp
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>

signature.asc
Description: This is a digitally signed message part.

Re: Google Summer of Code

Reply via email to