Well, I can't do 2 <b> TooManyClauses again. Should have realized that the
terms are assembled independently....

Erick

On 8/2/06, Erick Erickson <[EMAIL PROTECTED]> wrote:

I'm back, with another flavor of wildcards. What direction would you point
a poor boy who's project lead wants wildcard queries and spans? Here's the
problem....

I cannot use any of the classes that throw a "TooManyClauses" exception (
e.g. SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus
is big enough that this is guaranteed to be thrown. So, currently I'm using
a filter for wildcard queries, populating it via WildcardTermEnum and
TermDocs... Works like a champ. But I don't see how to combine this with
spans...

It seems to me that spans are incompatible with filters, they're just
different beasts. I see no way incorporate spans and filters without doing
actual work myself. So, it seems I'm left with several alternatives.

1> figure it out when creating the filter. Conceptually, for each document
find the offsets of the terms I want to span, and find out if the distance
between them fits my criteria and only add the doc to the filter if the
distance is within my parameters.

2> Look at the docs returned by the current filtered process and, for each
doc returned,
  a> don't add if it doesn't fit my span criteria by examining the term
positions.
  b> re-query with a wildcard span, restricted by doc ID. I *think* that
by restricting the query by (lucene) doc_id I'll be able to avoid the "too
many clauses" issue. Assuming that I remember correctly and that the
most-restrictive clause is honored when trying this....

guys, feel free to hop in here with just the names of the classes I really
want to pay attention to <G>....

I know this is scanty info, what I'm looking for is a very quick
pointer.... What I'm especially looking for is "Just use the
contrib/JustWhatYouWanted class" <G> although I poked around and didn't see
anything...

Thanks
Erick

Reply via email to