Re: Empty fields ...

Erick Erickson Wed, 19 Jul 2006 06:48:29 -0700

Try something like

TermDocs         termDocs = reader.termDocs();
termDocs.seek(new Term("<relevant field name here>", ""));
while (termDocs.next()) {
   bits.set(termDocs.doc());
}


I *think* (and I'm remembering things folks wrote, haven't done this myself)
that the empty string for the Term matches all terms. If not, you might have
to wrap in in an outer loop that loops through all the elements, something
like

       bits = new BitSet(reader.maxDoc());

       TermDocs         termDocs = reader.termDocs();
       FilteredTermEnum fEnum = new FilteredTermEnum(reader, new
Term(field, ""));

       for (Term term = null; (term = fEnum.term()) != null; fEnum.next())
{
           termDocs.seek(new Term(
                   field,
                   term.text()));

           while (termDocs.next()) {
               bits.set(termDocs.doc());
           }
       }



That said, it may be best for you to loop through each document and add that
doc to the relevant filters if it had the fields you're interested in. You'd
only be fetching each document once, so it'd only be one loop. I don't know
enough about relative efficiencies to make a call here, probably depends
upon how many docs you're dealing with. I'd stop at the first solution that
works with acceptable performance unless you expect your corpus to grow
significantly.... And since this is done in off hours, there's not a
pressing reason to go with the very most efficient solution unless it takes
a too long or you expect to have orders of magnitued more documents in your
index eventually.

Best
Erick

Re: Empty fields ...

Reply via email to