Re: carrot2 question too - Re: Fun with the Wikipedia

Akmal Sarhan Fri, 28 Jan 2005 07:19:14 -0800

Hello,

we have been experimenting with carrot2 and are very pleased so far,
only one issue: there is no release not even an alpha one and the
dependencies seemed to be patched (jama)
is there any intentions to have any releases in the near future?


thanks 

Akmal
Am Montag, den 17.01.2005, 10:15 +0100 schrieb Dawid Weiss:
> Hi David,
> 
> I apologize about the delay in answering this one, Lucene is a busy 
> mailing list and I had a hectic last week... Again, sorry for belated 
> answer, hope you still find it useful.
> 
> >> That is awesome and very inspirational!
> 
> Yes, I admit what you've done with Wikipedia is quite interesting and 
> looks very good. I'm also glad you spent some time working out Carrot 
> integration with Lucene. It works quite nice.
> 
> >> Carrot2 looks very interesting. Wondering if anybody has a list of all 
> >> the
> > 
> > Technically I don't think carrot2 uses lucene per-se- it's just that you 
> > can integrate the two, and ditto for Nutch - it has code that uses Carrot2.
> 
> Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it 
> merely takes the output from a query (titles, urls and snippets) and 
> attempts to cluster them into some sensible groups. I think many things 
> could be improved, the most important of them is fast snippet retrieval 
>    from Lucene because right now it takes 50% of the time of the 
> clustering; I've seen a post a while ago describing a faster snippet 
> generation technique, I'm sure that would give clustering a huge boost 
> speed-wise.
> 
> > And here's my question. I reread the Carrot2<->Lucene code, esp 
> > Demo.java, and there's this fragment:
> > 
> >     // warm-up round (stemmer tables must be read etc).
> >     List clusters = clusterer.clusterHits(docs);
> > 
> >     long clusteringStartTime = System.currentTimeMillis();
> >     clusters = clusterer.clusterHits(docs);
> >     long clusteringEndTime = System.currentTimeMillis();
> > 
> > Thus it calls clusterHits() twice.
> > 
> > I don't really understand how to use Carrot2 - but I think the above is 
> > just for the sake of benchmarking clusterHits() w/o the effect of 1-time 
> > initialization - and that there's no benefit of repeatedly calling 
> > clusterHits (where a benefit might be that it can find nested clusters 
> > or whatever) - is that right (that there's no benefit)?
> 
> No, there is absolutely no benefit from it. It was merely to show people 
> that the clustering needs to be warmed up a bit. I should not have put 
> it in the code knowing people would be confused by it. You can safely 
> use clusterHits just once. It will just have a small delay at the first 
> invocation.
> 
> 
> Thanks for experimenting. Please BCC me if you have any urgent projects 
> -- I read Lucene's list in batches and my personal e-mail I try to keep 
> up to date with.
> 
> Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> !EXCUBATOR:41eb81f8156071530375633!
> 
-- 
Akmal Sarhan <[EMAIL PROTECTED]>
ByteAction GmbH


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: carrot2 question too - Re: Fun with the Wikipedia

Reply via email to