[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Adrien Grand (JIRA) Tue, 30 Jun 2015 07:17:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608354#comment-14608354
 ]


Adrien Grand commented on LUCENE-6212:
--------------------------------------

bq. There are perfectly valid use cases to use a different Analyzer at query 
time rather than indexing time

This change doesn't force you to use the same analyzer at index time and search 
time, just to always use the same analyzer at index time.

bq. it's also possible to have text of different sources which has been 
pre-processed in different ways, so needs to be tokenized differently to get a 
consistent output

One way that this feature was misused was to handle multi-lingual content, but 
this would break term statistics as different words could be filtered to the 
same stem and a single word could be filtered to two different stems depending 
on the language. In general, if different analysis chains are required, it's 
better to just use different fields or even different indices.

> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 5.1, Trunk
>
>         Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Reply via email to