I'm back, with another flavor of wildcards. What direction would you point a poor boy who's project lead wants wildcard queries and spans? Here's the problem....
I cannot use any of the classes that throw a "TooManyClauses" exception (e.g. SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus is big enough that this is guaranteed to be thrown. So, currently I'm using a filter for wildcard queries, populating it via WildcardTermEnum and TermDocs... Works like a champ. But I don't see how to combine this with spans... It seems to me that spans are incompatible with filters, they're just different beasts. I see no way incorporate spans and filters without doing actual work myself. So, it seems I'm left with several alternatives. 1> figure it out when creating the filter. Conceptually, for each document find the offsets of the terms I want to span, and find out if the distance between them fits my criteria and only add the doc to the filter if the distance is within my parameters. 2> Look at the docs returned by the current filtered process and, for each doc returned, a> don't add if it doesn't fit my span criteria by examining the term positions. b> re-query with a wildcard span, restricted by doc ID. I *think* that by restricting the query by (lucene) doc_id I'll be able to avoid the "too many clauses" issue. Assuming that I remember correctly and that the most-restrictive clause is honored when trying this.... guys, feel free to hop in here with just the names of the classes I really want to pay attention to <G>.... I know this is scanty info, what I'm looking for is a very quick pointer.... What I'm especially looking for is "Just use the contrib/JustWhatYouWanted class" <G> although I poked around and didn't see anything... Thanks Erick