WikipediaTokenizer incorrectly splits certain syntax into multiple tokens -------------------------------------------------------------------------
Key: LUCENE-1141 URL: https://issues.apache.org/jira/browse/LUCENE-1141 Project: Lucene - Java Issue Type: Bug Components: contrib/wikipedia Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor WikipediaTokenizer incorrectly splits tokens that have italics/bold inside the token, for instance '''F'''oo is a bold Foo, not F, oo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]