[ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199390#comment-13199390 ]
Walter Underwood commented on LUCENE-3748: ------------------------------------------ Why make separate patches for characters instead of using Unicode normalization? Converting to NFKC would also solve this for the prime character (U+2032) and any other codepoint that is equivalent. Compatibility normalization is designed for precisely this purpose, equivalence ignoring appearance. > EnglishPossessiveFilter should work with Unicode right single quotation mark > ---------------------------------------------------------------------------- > > Key: LUCENE-3748 > URL: https://issues.apache.org/jira/browse/LUCENE-3748 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 3.1, 3.2, 3.4, 3.5 > Reporter: David Croley > Assignee: Robert Muir > Priority: Minor > Attachments: LucenePatch, Patch-Lucene-3748 > > > The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes > possessives using only the '\'' character (plus 's' or 'S'), but some common > systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) > instead and this is not removed when processing UTF-8 text. I propose to > change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org