[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Shai Erera (JIRA) Sun, 01 Feb 2015 03:05:45 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300154#comment-14300154
 ]


Shai Erera commented on LUCENE-6212:
------------------------------------

That doesn't help. If all your documents have a 'title' and 'body' fields (with 
an additional 'language'), you want the content to be indexed under the 'title' 
and 'body' fields, and not 'title_en' and 'title_de'. Well maybe you do/should 
but the point is that you have a single schema for your documents. The only 
thing that changes is how they are tokenized, and that's on a per-document 
basis, depending on its language.

> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk, 5.1
>
>         Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

Reply via email to