[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653010#action_12653010 ]
Robert Muir commented on LUCENE-1390: ------------------------------------- thanks guys, just as a comment to whoever is listining I think this is very useful functionality. I am indexing a lot of docs and doing it with ICU works well, but that method (unicode decomposition etc) is very expensive and still doesnt handle many common cases. In profiling, it was slowing down entire indexing process. The existing ISO filter doesn't handle many cases that are actually in use in my text, but this filter works well and appears to have coverage for most of the common cases such as full width forms, at the same time it is fast. Thanks, Robert > add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter > ------------------------------------------------------------ > > Key: LUCENE-1390 > URL: https://issues.apache.org/jira/browse/LUCENE-1390 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Environment: any > Reporter: Andi Vajda > Priority: Minor > Fix For: 2.9 > > Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, > ISOLatinAccentFilter.java > > > The ISOLatin1AccentFilter is removing accents from accented characters in the > ISO Latin 1 character set. > It does what it does and there is no bug with it. > It would be nicer, though, if there was a more comprehensive version of this > code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 > and Latin Extended A unicode blocks. > See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block > See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block > That way, all languages using roman characters are covered. > A new class, ISOLatinAccentFilter is attached. It is intended to supercede > ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]