On Tuesday 25 March 2008, Marko Novakovic wrote: > Other components will be clasifier, crawler and > indexer.
So it will be the typical setup: Crawl web pages, classify them as positive or negative and in the end index them correctly? I would be especially interested in how the classifier will be build - as far as you can share any such knowledge on a public mailing list before September '08. > I have idea about architecture in which all > components will be run at each machine. I think the system architecture was pretty clear from the slides you sent. I would be nice if you could briefly sketch them on list as the slides have not survived being sent to a mailing list :) > My idea for clustering would be making relevance by > properties, like repetition keywods on page, relevant > tags, keyword in subject etc. For each property will > be allocated one axis and from n-dimensional space > clustering machine will group pages by proper > algrithm, in my case k-Means. If I understood the task correctly the goal is to build a system that is capable of separating posts that express some opinion from objective ones and afterwards to group positive vs. negative postings, right? I do not yet see, how the clustering algorithm k-means helps you achieve this task. > If you want I will be able to describe detailed > relevance for clustering with proper examples > tomorrow. Sounds good. Isabel -- "Life sucks, but death doesn't put out at all...." -- Thomas J. Kopp |\ _,,,---,,_ Web: <http://www.isabel-drost.de> /,`.-'`' -. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: <xmpp://[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part.