[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842262#action_12842262
 ] 

Uwe Schindler edited comment on LUCENE-2295 at 3/7/10 9:51 AM:
---------------------------------------------------------------

Here is a first patch.

Robert & me found a bug in CharTokenizer that it not correctly sets the 
endOffset when the underlying reader is not exhausted. This is fixed in the 
patch, too. The bug in CharTokenizer was also there when sombody used the 
MaxFieldLength with IW -- possible highlighting problems :-(

Nevertheless, CharTokenizer should be rewritten, the code is not easy 
understandable. Too many states and branches and possibly uninitialized 
variables (i fixed by two asserts).

Deprecating IW.MaxFieldLength is not yet added, this is just the new 
Filter/Analyzer.

      was (Author: thetaphi):
    Here is a first patch.

Robert & me found a bug in CharTokenizer that it not correctly sets the 
endOffset when the underlying reader is not exhausted. This is fixed in the 
patch, too. The bug in CharTokenizer was also there when sombody used the 
MaxFieldLength with IW -- possible highlighting problems :-(

Nevertheless, CharTokenizer should be rewritten, the code is not understandable 
and ugly. Too many states and branches and possibly uninitialized variables (i 
fixed by two asserts).

Deprecating IW.MaxFieldLength is not yet added, this is just the new 
Filter/Analyzer.
  
> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2295
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2295
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Shai Erera
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to