[
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300135#comment-14300135
]
Shai Erera commented on LUCENE-6212:
------------------------------------
How do you index multi-lingual documents in one index then? We used to do it by
pulling the correct Analyzer per the document's language and call addDoc(doc,
langAnazlyer). What's the alternative without that API? Is there any easy
alternative, or should we add all fields to a document with a language-specific
TokenStream, which is much less convenient, but still an alternative.
Is it worth having a CHANGES / MIGRATION entry for this? I think if users
depend on that API for good reasons (i.e. it's not a 'trap' for them), it
should be mentioned somewhere..
> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
> Key: LUCENE-6212
> URL: https://issues.apache.org/jira/browse/LUCENE-6212
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, Trunk, 5.1
>
> Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]