Hey all,

I'm wondering if anyone has an idea for a solution to the following Lucene
problem. We'd like to group search results into buckets, but I can't find an
efficient way to do so besides modifying the source to the IndexSearcher
class.

A bit of background: we use Lucene to search over messages in our forum
application (Jive). Each message in the forum is a document with some of the
following fields:

 subject, body, threadID, forumID

Multiple messages belong to the same thread. The problem is that search
results are getting overwhelmed with multiple messages in the same thread.
This isn't ideal since multiple messages from a thread are displayed on the
same page (as an example, here's a thread page from a Jive forum:
http://forums.java.sun.com/thread.jsp?forum=45&thread=77362 ). A much better
solution would be to only show one result in the list of hits per thread.

I've played around trying to implement this with a filter, but that approach
doesn't seam feasible, since we can't know statically how the buckets should
be defined. The following wouldn't work as a filter:

 // Loop through all documents in index and set buckets appropriately.
 for (int i=0; i<numDocs; i++) {
     Document doc = reader.document(i);
     String fieldValue = doc.get("threadID");
     if (fieldValue != null) {
         if (!buckets.containsKey(fieldValue)) {
             buckets.put("threadID", fieldValue);
          }
          else {
              bits.set(i);
          }
      }
 }

since we can't know which messages in a thread might actually match the
search query, and we'd ideally like the highest rated message in the thread
to be the one that comes through as a hit. Instead, we need to create the
buckets dynamically as the search results start coming through in the
searcher. The algorithm would be:

 * Make an empty map
 * As documents come through as hits, mark threadIDs found in the map. If
document already found in map with same threadID, discard if it has a lower
hitValue, otherwise replace.

So, I think what I'm essentially asking for is a more complicated hit
collector, or a dynamic filter for the searcher. Does anybody know of a
better solution than modifying the low level source?

Thanks,
Matt


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users

Reply via email to