[
https://issues.apache.org/jira/browse/LUCENE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066845#comment-13066845
]
Uwe Schindler commented on LUCENE-3326:
---------------------------------------
+1 nuke default charset shit!
> MoreLikeThis reuses a reader after it has already closed it
> -----------------------------------------------------------
>
> Key: LUCENE-3326
> URL: https://issues.apache.org/jira/browse/LUCENE-3326
> Project: Lucene - Java
> Issue Type: Bug
> Components: modules/other
> Affects Versions: 3.3
> Reporter: Trejkaz
> Attachments: LUCENE-3326.patch
>
>
> MoreLikeThis has a fatal bug whereby it tries to reuse a reader for multiple
> fields:
> {code}
> Map<String,Int> words = new HashMap<String,Int>();
> for (int i = 0; i < fieldNames.length; i++) {
> String fieldName = fieldNames[i];
> addTermFrequencies(r, words, fieldName);
> }
> {code}
> However, addTermFrequencies() is creating a TokenStream for this reader:
> {code}
> TokenStream ts = analyzer.reusableTokenStream(fieldName, r);
> int tokenCount=0;
> // for every token
> CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
> ts.reset();
> while (ts.incrementToken()) {
> /* body omitted */
> }
> ts.end();
> ts.close();
> {code}
> When it closes this analyser, it closes the underlying reader. Then the
> second time around the loop, you get:
> {noformat}
> Caused by: java.io.IOException: Stream closed
> at sun.nio.cs.StreamDecoder.ensureOpen(StreamDecoder.java:27)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:128)
> at java.io.InputStreamReader.read(InputStreamReader.java:167)
> at com.acme.util.CompositeReader.read(CompositeReader.java:101)
> at
> org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:803)
> at
> org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1010)
> at
> org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:178)
> at
> org.apache.lucene.analysis.standard.StandardFilter.incrementTokenClassic(StandardFilter.java:61)
> at
> org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57)
> at
> com.acme.storage.index.analyser.NormaliseFilter.incrementToken(NormaliseFilter.java:51)
> at
> org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60)
> at
> org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:931)
> at
> org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:1003)
> at
> org.apache.lucene.search.similar.MoreLikeThis.retrieveInterestingTerms(MoreLikeThis.java:1036)
> {noformat}
> My first thought was that it seems like a "ReaderFactory" of sorts should be
> passed in so that a new Reader can be created for the second field (maybe the
> factory could be passed the field name, so that if someone wanted to pass a
> different reader to each, they could.)
> Interestingly, the methods taking File and URL exhibit the same issue. I'm
> not sure what to do about those (and we're not using them.) The method
> taking File could open the file twice, but the method taking a URL probably
> shouldn't fetch the same URL twice.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]