[ https://issues.apache.org/jira/browse/LUCENE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072342#comment-13072342 ]
Carl Austin commented on LUCENE-3326: ------------------------------------- In case you were unaware (as the JIRA says affects 3.3) this also affects 3.2 as I have just reproduced it. Thanks. > MoreLikeThis reuses a reader after it has already closed it > ----------------------------------------------------------- > > Key: LUCENE-3326 > URL: https://issues.apache.org/jira/browse/LUCENE-3326 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other > Affects Versions: 3.3 > Reporter: Trejkaz > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3326.patch > > > MoreLikeThis has a fatal bug whereby it tries to reuse a reader for multiple > fields: > {code} > Map<String,Int> words = new HashMap<String,Int>(); > for (int i = 0; i < fieldNames.length; i++) { > String fieldName = fieldNames[i]; > addTermFrequencies(r, words, fieldName); > } > {code} > However, addTermFrequencies() is creating a TokenStream for this reader: > {code} > TokenStream ts = analyzer.reusableTokenStream(fieldName, r); > int tokenCount=0; > // for every token > CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class); > ts.reset(); > while (ts.incrementToken()) { > /* body omitted */ > } > ts.end(); > ts.close(); > {code} > When it closes this analyser, it closes the underlying reader. Then the > second time around the loop, you get: > {noformat} > Caused by: java.io.IOException: Stream closed > at sun.nio.cs.StreamDecoder.ensureOpen(StreamDecoder.java:27) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:128) > at java.io.InputStreamReader.read(InputStreamReader.java:167) > at com.acme.util.CompositeReader.read(CompositeReader.java:101) > at > org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:803) > at > org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1010) > at > org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:178) > at > org.apache.lucene.analysis.standard.StandardFilter.incrementTokenClassic(StandardFilter.java:61) > at > org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57) > at > com.acme.storage.index.analyser.NormaliseFilter.incrementToken(NormaliseFilter.java:51) > at > org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60) > at > org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:931) > at > org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:1003) > at > org.apache.lucene.search.similar.MoreLikeThis.retrieveInterestingTerms(MoreLikeThis.java:1036) > {noformat} > My first thought was that it seems like a "ReaderFactory" of sorts should be > passed in so that a new Reader can be created for the second field (maybe the > factory could be passed the field name, so that if someone wanted to pass a > different reader to each, they could.) > Interestingly, the methods taking File and URL exhibit the same issue. I'm > not sure what to do about those (and we're not using them.) The method > taking File could open the file twice, but the method taking a URL probably > shouldn't fetch the same URL twice. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org