I registered myself just now, an interesting website. It seems crossfeeds generate a tag cloud offline hourly ? But I have a more strict time requirement. user submit a query in my website, and they may get tens of thousands of search results. I need to generate a tag cloud for all these document returned just during seconds of time. I think your solution might can fulfill this,if the indexing process and term ordering process were totally finish in memory..
----- Original Message ----- From: "Dominique Béjean" <[EMAIL PROTECTED]> To: <java-user@lucene.apache.org> Sent: Tuesday, April 01, 2008 8:30 PM Subject: RE: Problems about using Lucene to generate tag cloud.. On www.crossfeeds.com, I use this method in order to update hourly a tag cloud based on the title of 20.000 RSS articles (RSS published during the last 24 hours). It takes 1 minute. -----Message d'origine----- De : wuqi [mailto:[EMAIL PROTECTED] Envoyé : mardi 1 avril 2008 14:10 À : java-user@lucene.apache.org Objet : Re: Problems about using Lucene to generate tag cloud.. so build a index for the dynamically generated docucements set ,and then try to find frequency for each terms in this index... not sure it's fast enoug.but it's worth to have a try... Thank you Doinique! ----- Original Message ----- From: "Dominique Béjean" <[EMAIL PROTECTED]> To: <java-user@lucene.apache.org> Sent: Tuesday, April 01, 2008 3:51 PM Subject: RE: Problems about using Lucene to generate tag cloud.. May be you can index the set of documents in a temporary index. This index needs only one field (tag). Then you can browse the terms collection of the index and get each couple term/frequency IndexReader reader = IndexReader.open(temp_index); TermEnum terms = reader.terms(); while (terms.next()) { String field = terms.term().field(); if (!"tag".equals(field)) continue; String term = terms.term().text(); int freq = terms.docFreq(); } terms.close(); reader.close(); -----Message d'origine----- De : wuqi [mailto:[EMAIL PROTECTED] Envoyé : lundi 31 mars 2008 09:07 À : java-user@lucene.apache.org Objet : Problems about using Lucene to generate tag cloud.. Hi, I am trying to use Lucene index to implement a tag cloud system. I add a new field named "tags" in index to store all the tags,and we don't support tags with more than one word, so different tags of the same document just are separate by white space. The "tags" filed in one document may looks like this : doc1 tags : travel Beijing news doc2 tags: beijing sports news I can easily retrieve tags related with single document,and also get the documents related with certain tag, but it's hard find a "efficient" way to get frequent tags from a "set" of documents of this index.Tthe set of the documents is always generated dynamically, may be a search result, a dynamically generated category through clustering. The document set is very large, maybe several ten thousands or several hundred thousands.So simply iterate all the documents in the set and find the frequent tags might not be applicable.Do you have any better idea ? Thanks -Qi --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]