Unicode normalization
---------------------

                 Key: LUCY-191
                 URL: https://issues.apache.org/jira/browse/LUCY-191
             Project: Lucy
          Issue Type: New Feature
          Components: Analysis
            Reporter: Nick Wellnhofer
            Priority: Minor


As discussed on the mailing list, it would be nice to have Unicode 
normalization, Unicode case folding and stripping of accents as part of the 
analyzer chain. With the help of utf8proc this can be done in one pass. So I 
proposed a new analyzer Lucy::Analyzer::Normalizer with an interface described 
here:

http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201111.mbox/%3C4EC43816.1070107%40aevum.de%3E


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to