thanks for the advice guys! currently , i am iterating through about 200-300 of the top docs and creating the groups (so, as of now, the groups are partial) , my response time HAS to be at most 500-600 milli (query + groupings) or my company will probably go with a commercial search engine such as FAST or something of the sort the data i must prove lucene can handle is up to 25 million 2-4Kb docs, i dont think it is feasable to creat groups on a result set consisting of a few thousand results in a 1/2 second, or am i wrong? i will try both solutions , and hope for the best in any case much much thanks!
Chris Hostetter <[EMAIL PROTECTED]> wrote: An approach like mark is describing sould should be a lot more space efficient then the BitSet intersection approach i described before, but depending on how many groupings you want, i can immagine that it might be slower some cases. Unfortunately, it also only works if the grouping you wnat are tied directly to field values (in my case, i need to support ranges, prefix queries, and and boolean queries for each grouping item) Along a similar line of thinking: if the fields you want to group by are non-tokenized (ie: only one value per doc) then you can iterate of the set bits from your orriginal search and looking the value for each matching doc using the FieldCache. not sure if that would be more space/time efficient then looping over the TermEnum/TermDocs ... i guess it depends on the average size of your results, the average size of your index, and the average number of terms in the fields you want to group by. : Date: Mon, 30 Jan 2006 17:45:10 +0000 (GMT) : From: mark harwood : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: RE: grouping results by fields : : > A simple solution if you only have 20,000 docs is : > just to iterate : > through the hits and count them up against each : > color etc, : : The one thing to avoid is reader.document() calls in : such a tight loop. This is always a killer. : : The best way I've found is to create one bitset for : all the matching docs then use TermEnum on the "group" : field(s) to find all the docids - then check each : docId against the "matches" bitset to accumulate : scores for each unique "group" field value: : : TermEnum te = reader.terms(new : Term(groupFieldName, "")); : Term term = te.term(); : while (term!=null) : { : if (term.field().equals(groupFieldName)) : { : TermDocs termDocs = : reader.termDocs(term); : GroupTotal groupTotal = null; : : boolean continueThisTerm = true; : while ((continueThisTerm) && : (termDocs.next())) : { : int docID = termDocs.doc(); : if (queryMmatchedDocs.get(docId)) : { : if (groupTotal == null) : { : //look up the group key : and initialize : String termText = : term.text(); : Object key = termText; : groupTotal = (GroupTotal) : totals.get(key); : if (groupTotal == null) : { : //no totals exist yet, : create new one. : groupTotal = new : GroupTotal((key); : totals.put(key, : groupTotal); : } : } : : groupTotal.addQueryMatchDoc(docID); : } : } : } else : { : break; : } : if(te.next()) : { : term=te.term(); : } : else : { : break; : } : } : : Cheers : Mark : : : : : ___________________________________________________________ : Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Bring words and photos together (easily) with PhotoMail - it's free and works with Yahoo! Mail.