Re: Case Sensitivity - and more

Michal Plechawski Tue, 22 Jan 2002 00:09:56 -0800

Hi,

I have never written anything to the list but in fact, I am doing some
development using Lucene.
I think that Brian's idea is more flexible and extendable. In my
application, I need three or more kinds of analyzers: for counting tfidf
statistics, for indexing (compute more, e.g. summaries) and for document
classification (compute document-to-class assignment and store outside the
index) and for some minor things.
My experience shows that in complex Lucene applications there is a
substantial need for many different Analyzers or - better solution - many
faces of the same Analyzer in the same time. Something should be done
here.


Another story is - why did you put document deletion to IndexReader? I guess
the main reason was the implementation, but from the API point of view it is
horrible. I've got an abstraction 'Index' in my code with both add/remove
operations, and switching between IndexReader and IndexWriter is not a thing
I like the best, and I am forced now to add some cache for performance. I
think one of the reasons is an unconsequent document id support - in delete
there is an assumption, that documents may be uniquely identified, and in
IndexWriter there is nothing like that. I think it should be very helpful
for us developers to add id to documents, but may be very hard to implement.

Last thing - did you ever think about adding transactions to Lucene? May be
very simple exclusive-write transactions - e.g. reads are not transacted nor
isolated, and writes are done in such a way - the write is exclusive (I
guess it is in 1.2, I use 1.0), and one may commit/rollback all changes made
during last session. Would it be hard?

With all these issues added, Lucene would be mature enough to be used as an
indexing engine in mission-critical applications.

Regards,
Michal



----- Original Message -----
From: "Brian Goetz" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, January 22, 2002 12:12 AM
Subject: Re: Case Sensitivity


> > Wildcard queries are case sensitive, while other queries depend on the
> > analyzer used for the field searched.  The standard analyzer lowercases,
so
> > lowercased terms are indexed.  Thus your "SPINAL CORD" query is
lowercased
> > and matches the indexed terms "spinal" and "cord".  However, since
prefixes
> > should not be stemmed they are not run through an analyzer and are hence
> > case sensitive.  Your index contains no terms starting with "SPI" or
"COR",
> > since all terms were lowercased when indexed.
> >
> > This question is frequent enough that we should probably fix it.
Perhaps a
> > method should be added Analyzer:
> >   public boolean isLowercased(String fieldName);
> > When this is true, the query parser could lowercase prefix and range
query
> > terms.  Fellow Lucene developers, what do you think of that?
>
> Something should be done, but I'm not sure this is the best way to do
> this.  Perhaps extend Analyzer to work in two modes;
> "tokenization-only" and "tokenization + term normalization".
>
>
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Case Sensitivity - and more

Reply via email to