> make both a stemmed field and an unstemmed field While this approach is easy and would work, it means increasing the size of the index and reindexing every document. However, the information is already available in the existing field and runtime analysis is certainly faster than more disk I/O.
> Changing the index at query time No... I'm sure I left out enough details to be confusing. The additional fields in our current workaround need to be managed periodically over the course of a document's lifetime, not at query time. To be honest, I'm not involved on the workaround. I'm just trying to avoid going down the wrong path. > Query.html#rewrite So it sounds like I should attempt to write my own InverseWildcardQuery class which overrides this method. Perhaps I can use WildcardQuery as an example? WildcardQuery has changed parent classes over recent versions and and we're unfortunately using the older at the moment. I guess I had hoped others have handled this type of query before. Again, we're just trying to find all documents that have at least one field that doesn't match the wildcard query. ----- Original Message ---- From: Steven A Rowe <sar...@syr.edu> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> Sent: Fri, July 30, 2010 3:01:14 PM Subject: RE: InverseWildcardQuery > > you want what Lucene already does, but that's clearly not true > > Hmmm, let's pretend that "contents" field in my example wasn't analyzed at >index > time. The unstemmed form of terms will be indexed. But if I query with a >stemmed > form or use QueryParser with the SnowballAnalyzer, I'm not going to get a >match. > I could fix this situation by analyzing the indexed field at search time to > match the query. I don't know that Lucene provides this opportunity and, as I > said, maybe that's crazy. I don't know of any facility in Lucene to re-analyze indexed content at search time. This is an IR anti-pattern - it defeats the purpose of constructing the inverted index. People generally decide prior to indexing what kind of queries they need to support, and then perform all required document analysis at index time. To use your stemming example, if you need to be able to match against both stemmed and unstemmed forms, you would make both a stemmed field and an unstemmed field, and then construct corresponding sub-queries against each as required. > > What do you mean when you say "rewriting" > > "Our current path to solving our problem requires additional fields which need > rewritten". I meant actually altering the document in the index. My desire has > been to write a new Query class implementation whereas you mentioned query > rewriting (isn't that accomplished as in my example by passing an Analyzer to > QueryParser?) Changing the index at query time, unless you have a tiny set of documents, sounds like a big mistake to me. Again, extremely expensive. Query rewriting: <http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Query.html#rewrite%28org.apache.lucene.index.IndexReader%29> > > not discrete... limited number of prefixes > > So my document may have "myfield:A*foo*", "myfield:B*foo*", > "myfield:A*dog*", and "myfield:D*cat*". > > Or, to phrase differently, "myfield:[PREFIX][PATTERN]" may appear any > number of times where PREFIX comes from the set { A, B, C, D, E, ... }. > > This complexity is really a tangent of my question in order to avoid poor > performance from WildcardQuery. I still think you could make one field for each PREFIX, and then do whatever query you want on each of those fields. If you need to support "contains" functionality (e.g. "A:*foo*"), you might want to look into ngram analysis at indexing time (and maybe also at query time, depending on the source of your queries): <http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/ngram/NGramTokenFilter.html> You would have to know the minimum and maximum length of the contained string you would want to query for. Steve --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org