Re: Case Sensitivity - and more

Michal Plechawski Fri, 25 Jan 2002 01:36:21 -0800

> Currently it is easy to use different analyzers for different purposes,
no?
> I'm not sure how Brian's proposal (bi-modal analyzers: tokenize only &
> tokenize+normalize) addresses your needs.


Ok, maybe I misled a point a bit. But Brian's proposal as I see it was to
_group_ two tokenizers that differ in a single thing. For the query parser,
it would use TWO analyzers, one for things that need normalization and
another for things that need no normalization. It is extremely important,
that these two analyzers are compatible (ie. differ only in normalization
field), especially for applications juggling with many types of analyzers
(eg. multilingual). May not happen that normalized analyzer is English and
unnormalized is German for example, and Lucene API should support dealing
with these (giving something like Analyzers class with two parts
normalized() and unnormalized() or something like this).

> Yes, sorry.  I wonder if it would have been better to instead call
> IndexWriter IndexAdder or something, to make clear that it can only add
> documents.  Perhaps someday this can be fixed.

I agree it would be better to call it IndexAdder. I guess that this is a
major architectural change to add a possibility to:
1) identify the doc with a numeric unique id
2) to check that this id is unique
3) to make it possible to delete the document with a given id calling an
IndexWriter method
Ok, can live without this, but the document uniqueness and identification
would be very helpful for any "mission-critical" applications of Lucene,
where it is unacceptable to have document repetitions and where the index
change quite often.

> That is in fact what is done in 1.2.

Thanks, I didn't know.

Regards,
Michal


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Case Sensitivity - and more

Reply via email to