[
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873397#action_12873397
]
Michael McCandless commented on LUCENE-2295:
--------------------------------------------
bq. Further investigantions showed, that there is some difference between using
this filter/analyzer and the current setting in IndexWriter. IndexWriter uses
the given MaxFieldLength as maximum value for all instances of the same field
name. So if you add 100 fields "foo" (with each 1,000 terms) and have the
default of 10,000 tokens, DocInverter will index 10 of these field instances
(10,000 terms in total) and the rest will be supressed.
In LUCENE-2450 I'm experimenting with having multi-valued fields be handled
entirely by an analyzer stage, ie, the logical concatenation of tokens (with
gaps) would "hidden" to IW, and IW would think its dealing with a single token
stream. In this model, if you then appended the new LimitTokenCountFilter to
the end, I think it'd result in the same behavior as maxFieldLength today.
But, even before we eventually switch to that model... can't we still deprecate
(on 3x) IW's maxFieldLength (remove from trunk) now? I realize the limiting is
different (applying the limit pre vs post concatenation), but I think the
javadocs can explain this difference? I think it's unlikely apps are relying
on this specific interaction of truncation and multi-valued fields...
> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the
> same functionality as MaxFieldLength provided on IndexWriter
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Shai Erera
> Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on
> IndexWriter his requested MFL limit, we can get rid of this setting entirely
> by providing an Analyzer which will wrap any other Analyzer and its
> TokenStream with a TokenFilter that keeps track of the number of tokens
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]