Hi Mark,
This is a good idea I hadn't though of. But I don't think it will
work in my case, since I need the actual whole field values (i.e. "Sun
Microsystems", not just the tokens, [sun] or [microsystems]). It
might work if the fields happen to be indexed as keywords, but in my
case they are not.
-chris
On Thu, 10 Mar 2005 07:58:59 +0000 (GMT), mark harwood
<[EMAIL PROTECTED]> wrote:
> >>To get complete statistics like
> >>above, you currently have to iterate through the
> result
> >> set and pull each Document from the Hits.
>
> Not necessarily true. You can use TermVectors or an
> indexed field eg "doctype" to derive this stuff
> without stored fields. Here's an example of how I've
> done it before using indexed fields. I've been meaning
> to tidy this up and contribute this as it looks like
> it could be generally useful. The "GroupKeyFactory" is
> an abstraction which allows you to process a term
> before using it for totalling eg to group dates on a
> year rather than a full date.
>
> protected GroupTotal[]
> groupByIndexTokens(GroupQueryParams params)throws
> ParseException, IOException
> {
> final HashMap totals = new HashMap();
> final GroupingKeyFactory groupKeyFactory =
> params.getGroupKeyFactory();
> String groupFieldName =
> params.getGroupFieldName();
> //TODO IndexSearcher should be passed in and
> resused?
> IndexSearcher searcher = new
> IndexSearcher(reader);
> float minScore = params.getMinDocScore();
> final float scores[] = new
> float[reader.numDocs()];
> String queryString=params.getQuery();
>
> if((queryString==null)||(queryString.trim().length()==0))
> {
> //TODO if query is null then we could
> optimise counting by just taking docFreq
> // from TermEnum and avoding use of
> TermDocs?
> Arrays.fill(scores,1);
> }
> else
> {
> Query query = null;
> query = QueryParser.parse(params.getQuery(),
> "contents", analyzer);
> searcher.search(query, null, new
> HitCollector()
> {
> public void collect(int docID, float
> score)
> {
> scores[docID] = score;
> }
> });
> }
>
> TermEnum te = reader.terms(new
> Term(groupFieldName, ""));
> Term term = te.term();
> while (term!=null)
> {
> if (term.field().equals(groupFieldName))
> {
> TermDocs termDocs =
> reader.termDocs(term);
> GroupTotal groupTotal = null;
>
> boolean continueThisTerm = true;
> while ((continueThisTerm) &&
> (termDocs.next()))
> {
> int docID = termDocs.doc();
> float docScore = scores[docID];
> //TODO include logic to test
> queryParams.includeZeroScore groups
> if ((docScore > 0) && (docScore >
> minScore))
> //
> if(docScore>minScore)
> {
> if (groupTotal == null)
> {
> //look up the group key
> and initialize
> String termText =
> term.text();
> Object key = termText;
> if (groupKeyFactory !=
> null)
> {
> key =
> groupKeyFactory.getGroupingKey(termText,docID);
> if (key == null)
> {
> continueThisTerm =
> false;
> continue;
> }
> }
> groupTotal = (GroupTotal)
> totals.get(key);
> if (groupTotal == null)
> {
> //no totals exist yet,
> create new one.
> groupTotal = new
> GroupTotal(params
>
> .getReturnDocIdsWithGroups());
>
> groupTotal.setGroupKey(key);
> totals.put(key,
> groupTotal);
>
> groupTotal.addToTotalDocFreq(te.docFreq());
> }
> }
>
> groupTotal.addQueryMatchDoc(docID, scores[docID]);
> }
> }
> } else
> {
> break;
> }
> if(te.next())
> {
> term=te.term();
> }
> else
> {
> break;
> }
> }
> Collection result = totals.values();
> GroupTotal[] results = (GroupTotal[])
> result.toArray(new GroupTotal[result.size()]);
> return results;
> }
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]