[ https://issues.apache.org/jira/browse/LUCENE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll updated LUCENE-1141: ------------------------------------ Attachment: LUCENE-1141-test.patch Here's a test case for the problem > WikipediaTokenizer incorrectly splits certain syntax into multiple tokens > ------------------------------------------------------------------------- > > Key: LUCENE-1141 > URL: https://issues.apache.org/jira/browse/LUCENE-1141 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/wikipedia > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: LUCENE-1141-test.patch > > > WikipediaTokenizer incorrectly splits tokens that have italics/bold inside > the token, for instance '''F'''oo is a bold Foo, not F, oo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]