[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873397#action_12873397 ]
Michael McCandless commented on LUCENE-2295: -------------------------------------------- bq. Further investigantions showed, that there is some difference between using this filter/analyzer and the current setting in IndexWriter. IndexWriter uses the given MaxFieldLength as maximum value for all instances of the same field name. So if you add 100 fields "foo" (with each 1,000 terms) and have the default of 10,000 tokens, DocInverter will index 10 of these field instances (10,000 terms in total) and the rest will be supressed. In LUCENE-2450 I'm experimenting with having multi-valued fields be handled entirely by an analyzer stage, ie, the logical concatenation of tokens (with gaps) would "hidden" to IW, and IW would think its dealing with a single token stream. In this model, if you then appended the new LimitTokenCountFilter to the end, I think it'd result in the same behavior as maxFieldLength today. But, even before we eventually switch to that model... can't we still deprecate (on 3x) IW's maxFieldLength (remove from trunk) now? I realize the limiting is different (applying the limit pre vs post concatenation), but I think the javadocs can explain this difference? I think it's unlikely apps are relying on this specific interaction of truncation and multi-valued fields... > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Shai Erera > Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org