[ 
https://issues.apache.org/jira/browse/LUCENE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072342#comment-13072342
 ] 

Carl Austin commented on LUCENE-3326:
-------------------------------------

In case you were unaware (as the JIRA says affects 3.3) this also affects 3.2 
as I have just reproduced it.
Thanks.

> MoreLikeThis reuses a reader after it has already closed it
> -----------------------------------------------------------
>
>                 Key: LUCENE-3326
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3326
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/other
>    Affects Versions: 3.3
>            Reporter: Trejkaz
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3326.patch
>
>
> MoreLikeThis has a fatal bug whereby it tries to reuse a reader for multiple 
> fields:
> {code}
>     Map<String,Int> words = new HashMap<String,Int>();
>     for (int i = 0; i < fieldNames.length; i++) {
>         String fieldName = fieldNames[i];
>         addTermFrequencies(r, words, fieldName);
>     }
> {code}
> However, addTermFrequencies() is creating a TokenStream for this reader:
> {code}
>     TokenStream ts = analyzer.reusableTokenStream(fieldName, r);
>     int tokenCount=0;
>     // for every token
>     CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
>     ts.reset();
>     while (ts.incrementToken()) {
>         /* body omitted */
>     }
>     ts.end();
>     ts.close();
> {code}
> When it closes this analyser, it closes the underlying reader.  Then the 
> second time around the loop, you get:
> {noformat}
> Caused by: java.io.IOException: Stream closed
>       at sun.nio.cs.StreamDecoder.ensureOpen(StreamDecoder.java:27)
>       at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:128)
>       at java.io.InputStreamReader.read(InputStreamReader.java:167)
>       at com.acme.util.CompositeReader.read(CompositeReader.java:101)
>       at 
> org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:803)
>       at 
> org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1010)
>       at 
> org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:178)
>       at 
> org.apache.lucene.analysis.standard.StandardFilter.incrementTokenClassic(StandardFilter.java:61)
>       at 
> org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57)
>       at 
> com.acme.storage.index.analyser.NormaliseFilter.incrementToken(NormaliseFilter.java:51)
>       at 
> org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60)
>       at 
> org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:931)
>       at 
> org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:1003)
>       at 
> org.apache.lucene.search.similar.MoreLikeThis.retrieveInterestingTerms(MoreLikeThis.java:1036)
> {noformat}
> My first thought was that it seems like a "ReaderFactory" of sorts should be 
> passed in so that a new Reader can be created for the second field (maybe the 
> factory could be passed the field name, so that if someone wanted to pass a 
> different reader to each, they could.)
> Interestingly, the methods taking File and URL exhibit the same issue.  I'm 
> not sure what to do about those (and we're not using them.)  The method 
> taking File could open the file twice, but the method taking a URL probably 
> shouldn't fetch the same URL twice.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to