Some details about number of documents, size of documents, hardware
used and what you consider fast would help answering, but my gut
feeling is that this problem would work faster on a database
than using full text search.

On Fri, Dec 12, 2003 at 09:45:53AM -0400, Karl Penney wrote:
> I'm looking for a fast way (execution wise) to get a list of unique values for a 
> field called "partno" for all documents which have a given value for a field called 
> "type".  This is for adding values to a drop-down list.
> 
> What I have done so far is to build a list of document numbers for each value of the 
> "type" field (using TermEnum and TermDocs).  I then use TermEnum to get a list of 
> values for the "partno" field.  This returns values for all of the documents, so for 
> each "partno" value I use TermDocs to get a list of document numbers for the 
> "partno" and then cross check this list against the document list for "type".  If at 
> least one of the "partno" documents is in the "type" list then I know that the 
> "partno" value is valid for that "type" value.
> 
> This works ok, but is not as fast as I would like it to be.  Any ideas on how to 
> improve it?
> 
> Here is some sample code:
> 
> // build list of document numbers for each value of the type field
> TermEnum te = reader.terms(new Term("type", ""));
> typeMap = new HashMap();
> while (te.term() != null && "type".equals(te.term().field())) {
>   String type = te.term().text();
>   TermDocs docs = reader.termDocs(new Term("type", type));
>   ArrayList typelist = new ArrayList();
>   while (docs.next()) {
>     String docnum = String.valueOf(docs.doc());
>     typelist.add(docnum);
>   }
>   typeMap.put(type, doclist);
>   te.next(); 
> }
> 
> // add partno values to dropdown for type-1
> ArrayList doclist = (ArrayList)typeMap.get("type-1");
> if (doclist != null) {
>     // get unique for values for the partno field from the index
>     TermEnum te = reader.terms(new Term("partno", ""));
>     while (te.term() != null && "partno".equals(te.term().field())) {
>         String value = te.term().text();
>         // check that at least one document with the "partno" value is in the list 
> for type-1
>         TermDocs docs = reader.termDocs(new Term("partno", value));
>         while (docs.next()) {
>             String docnum = String.valueOf(docs.doc());
>             if (doclist.contains(docnum)) {
>                 // add value to drop down
>                 cbo.add(value.toUpperCase());
>                 break;
>             }
>         }
>         te.next();            
>     }
> }
> 
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to