On 10/5/05, Dawid Weiss <[EMAIL PROTECTED]> wrote:
>
>
> > I am planning to take a closer look to the carrot2 implementation and
> expose
> > the other algorithms to the user,
>
> That's actually quite simple -- I was planning to do it, but have no
> time at the moment. The current Carrot2 code in Nutch is a preconfigured
> process which uses the open source Lingo clustering algorithm to cluster
> documents. But the the codebase of Carrot2 there is now a scriptable
> controller, so you could basically have external scripts configuring
> several different algorithms. It really isn't that difficult. If you
> need any help, let me know -- private e-mail or the newsgroup, whatever.


That would be great, I looked already to the code base in the plug-in
directory and it seems you use this call to get the clustering results:

controller.query("lingo-nmf-km-3", "pseudo-query", requestParams);
am I right ?

anyway, I want to have the type of algorithm used for clustering, picked up
from the xml file, it should be easy to do so.

Any guidelines, ideas are welcomed.


> changes to the algorithm(s) so that speed wise be as good as vivisimo (not
> > only interface wise ;-)).
>
> We don't know what Vivisimo algorithm is really like in terms of speed.
> Its authors and co-funders are excellent researchers, so I guess it
> will be a tough beast to beat :) But of course we don't have any reasons
> to be ashamed -- the open source version is quite decent.


That's the spirit, and is going to get better ;-).

In the
> commercial version we refactored the codebase and added an optional
> native matrix computation library. The speedup is significant (which
> matters only if your servers are really under a lot of load).
>
> Dawid


Cheers,
R.

Reply via email to