Hi Mark, This is a good idea I hadn't though of. But I don't think it will work in my case, since I need the actual whole field values (i.e. "Sun Microsystems", not just the tokens, [sun] or [microsystems]). It might work if the fields happen to be indexed as keywords, but in my case they are not.
-chris On Thu, 10 Mar 2005 07:58:59 +0000 (GMT), mark harwood <[EMAIL PROTECTED]> wrote: > >>To get complete statistics like > >>above, you currently have to iterate through the > result > >> set and pull each Document from the Hits. > > Not necessarily true. You can use TermVectors or an > indexed field eg "doctype" to derive this stuff > without stored fields. Here's an example of how I've > done it before using indexed fields. I've been meaning > to tidy this up and contribute this as it looks like > it could be generally useful. The "GroupKeyFactory" is > an abstraction which allows you to process a term > before using it for totalling eg to group dates on a > year rather than a full date. > > protected GroupTotal[] > groupByIndexTokens(GroupQueryParams params)throws > ParseException, IOException > { > final HashMap totals = new HashMap(); > final GroupingKeyFactory groupKeyFactory = > params.getGroupKeyFactory(); > String groupFieldName = > params.getGroupFieldName(); > //TODO IndexSearcher should be passed in and > resused? > IndexSearcher searcher = new > IndexSearcher(reader); > float minScore = params.getMinDocScore(); > final float scores[] = new > float[reader.numDocs()]; > String queryString=params.getQuery(); > > if((queryString==null)||(queryString.trim().length()==0)) > { > //TODO if query is null then we could > optimise counting by just taking docFreq > // from TermEnum and avoding use of > TermDocs? > Arrays.fill(scores,1); > } > else > { > Query query = null; > query = QueryParser.parse(params.getQuery(), > "contents", analyzer); > searcher.search(query, null, new > HitCollector() > { > public void collect(int docID, float > score) > { > scores[docID] = score; > } > }); > } > > TermEnum te = reader.terms(new > Term(groupFieldName, "")); > Term term = te.term(); > while (term!=null) > { > if (term.field().equals(groupFieldName)) > { > TermDocs termDocs = > reader.termDocs(term); > GroupTotal groupTotal = null; > > boolean continueThisTerm = true; > while ((continueThisTerm) && > (termDocs.next())) > { > int docID = termDocs.doc(); > float docScore = scores[docID]; > //TODO include logic to test > queryParams.includeZeroScore groups > if ((docScore > 0) && (docScore > > minScore)) > // > if(docScore>minScore) > { > if (groupTotal == null) > { > //look up the group key > and initialize > String termText = > term.text(); > Object key = termText; > if (groupKeyFactory != > null) > { > key = > groupKeyFactory.getGroupingKey(termText,docID); > if (key == null) > { > continueThisTerm = > false; > continue; > } > } > groupTotal = (GroupTotal) > totals.get(key); > if (groupTotal == null) > { > //no totals exist yet, > create new one. > groupTotal = new > GroupTotal(params > > .getReturnDocIdsWithGroups()); > > groupTotal.setGroupKey(key); > totals.put(key, > groupTotal); > > groupTotal.addToTotalDocFreq(te.docFreq()); > } > } > > groupTotal.addQueryMatchDoc(docID, scores[docID]); > } > } > } else > { > break; > } > if(te.next()) > { > term=te.term(); > } > else > { > break; > } > } > Collection result = totals.values(); > GroupTotal[] results = (GroupTotal[]) > result.toArray(new GroupTotal[result.size()]); > return results; > } > > Send instant messages to your online friends http://uk.messenger.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]