On 01/26/2014 09:59 AM, Jörn Kottmann wrote:
>
> The evaluation should ignore white spaces. I committed now my fix, it
> would be nice if you can
> test it.
>
> There might be still something wrong. In my test data I replaced all
> question marks with white spaces, and the result
> is slightly worse than with the original data.
>
> Jörn
Yes, this fixes the whitespace sentence issue but the evaluation issue
remains. I believe the problem is in SentenceSampleStream, where in the
following block the whitespace trim happens before the <LF> character is
replaced with the \n character. So test sentences that ended with <LF>
will be one character longer than they should be.
> sentence = sentence.trim();
> sentence = replaceNewLineEscapeTags(sentence);
> sentencesString.append(sentence);
> int end = sentencesString.length();
> sentenceSpans.add(new Span(begin, end));
> sentencesString.append(' ');