WriteLineDocTask incorrectly normalizes fields ----------------------------------------------
Key: LUCENE-1714 URL: https://issues.apache.org/jira/browse/LUCENE-1714 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Fix For: 2.9 WriteLineDocTask normalizes the body, title and date fields by replacing any "\t" with a space. However, if any one of them contains newlines, LineDocMaker will fail, since the first line read will include some of the text, however the second line, which it now expects to be a new document, will include other parts of the text. I don't know how we didn't hit it so far. Maybe the wikipedia text doesn't have such lines, however when I ran over the TREC collection I hit a lot of those. I will attach a patch shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org