[
https://issues.apache.org/jira/browse/SLING-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carsten Ziegeler resolved SLING-2609.
-------------------------------------
Resolution: Won't Fix
> Support non-ASCII based languages for node name generation
> ----------------------------------------------------------
>
> Key: SLING-2609
> URL: https://issues.apache.org/jira/browse/SLING-2609
> Project: Sling
> Issue Type: Improvement
> Components: Servlets
> Affects Versions: Servlets Post 2.1.2
> Reporter: Felix Meschberger
> Assignee: Felix Meschberger
> Priority: Major
> Attachments: NodeNameFilter.java, SLING-2609.patch
>
>
> The Sling POST Servlet has built-in support to automatically generate names
> for newly generated resources based of some name hint or the value of some
> select properties.
> Such name hints are filtered in a very crude way, though:
> * the string is converted to lower case
> * only ascii letters and digits supported
> * non-accepted characters replaced by underscore (_)
> This leads to the following problems:
> * Non-BMP (surrogate) Unicode characters are converted to just underscore
> * Words separated by whitespace (e.g. the title "My Brand new Page" are now
> separated by underscore instead of dash (-) which may lead to indexing
> problems (see http://www.youtube.com/watch?v=AQcSFsQyct8)
> This all happens in the NodeNameFilter class.
> I suggest we change this as follows:
> * Operate on code points instead (int type) of just characters (char type)
> * Accept all characters valid for JCR names. This is all Unicode characters
> except { ., /, :, [, ], *, ', ", | }. These characters are replaced by
> underscore
> * Convert all white space characters (Character.isWhitespace(int)) by dash
> * Convert all other characters to lower case (Character.toLowerCase(int))
> * Consecutive dash and underscore characters folded into just one
--
This message was sent by Atlassian Jira
(v8.20.10#820010)