[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Sanne Grinovero (JIRA) Tue, 30 Jun 2015 04:40:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608156#comment-14608156
 ]


Sanne Grinovero commented on LUCENE-6212:
-----------------------------------------

Hello,
I understand there are good reasons to prevent this for the "average user" but 
I would beg you to restore the functionality for those who know what they are 
doing.

There are perfectly valid use cases to use a different Analyzer at query time 
rather than indexing time, for example when handling synonyms at indexing time 
you don't need to apply the substitutions again at query time.
Beyond synonyms, it's also possible to have text of different sources which has 
been pre-processed in different ways, so needs to be tokenized differently to 
get a consistent output.

I love the idea of Lucene to become more strict regarding to consistent schema 
choices, but I would hope we could stick to field types and encoding, while 
Analyzer mappings can use a bit more flexibility?

Would you accept a patch to overload
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable<? 
extends IndexableField>){code}
with the expert version:
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable<? 
extends IndexableField>, Analyzer overrideAnalyzer){code} ?

That would greatly help me to migrate to Lucene 5. My alternatives are to 
close/open the IndexWriter for each Analyzer change but that would have a 
significant performance impact; I'd rather cheat and pass an Analyzer instance 
which is mutable, even if that would prevent me from using the IndexWriter 
concurrently.

> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 5.1, Trunk
>
>         Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Reply via email to