Hi, I have never written anything to the list but in fact, I am doing some development using Lucene. I think that Brian's idea is more flexible and extendable. In my application, I need three or more kinds of analyzers: for counting tfidf statistics, for indexing (compute more, e.g. summaries) and for document classification (compute document-to-class assignment and store outside the index) and for some minor things. My experience shows that in complex Lucene applications there is a substantial need for many different Analyzers or - better solution - many faces of the same Analyzer in the same time. Something should be done here.
Another story is - why did you put document deletion to IndexReader? I guess the main reason was the implementation, but from the API point of view it is horrible. I've got an abstraction 'Index' in my code with both add/remove operations, and switching between IndexReader and IndexWriter is not a thing I like the best, and I am forced now to add some cache for performance. I think one of the reasons is an unconsequent document id support - in delete there is an assumption, that documents may be uniquely identified, and in IndexWriter there is nothing like that. I think it should be very helpful for us developers to add id to documents, but may be very hard to implement. Last thing - did you ever think about adding transactions to Lucene? May be very simple exclusive-write transactions - e.g. reads are not transacted nor isolated, and writes are done in such a way - the write is exclusive (I guess it is in 1.2, I use 1.0), and one may commit/rollback all changes made during last session. Would it be hard? With all these issues added, Lucene would be mature enough to be used as an indexing engine in mission-critical applications. Regards, Michal ----- Original Message ----- From: "Brian Goetz" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, January 22, 2002 12:12 AM Subject: Re: Case Sensitivity > > Wildcard queries are case sensitive, while other queries depend on the > > analyzer used for the field searched. The standard analyzer lowercases, so > > lowercased terms are indexed. Thus your "SPINAL CORD" query is lowercased > > and matches the indexed terms "spinal" and "cord". However, since prefixes > > should not be stemmed they are not run through an analyzer and are hence > > case sensitive. Your index contains no terms starting with "SPI" or "COR", > > since all terms were lowercased when indexed. > > > > This question is frequent enough that we should probably fix it. Perhaps a > > method should be added Analyzer: > > public boolean isLowercased(String fieldName); > > When this is true, the query parser could lowercase prefix and range query > > terms. Fellow Lucene developers, what do you think of that? > > Something should be done, but I'm not sure this is the best way to do > this. Perhaps extend Analyzer to work in two modes; > "tokenization-only" and "tokenization + term normalization". > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
