Felix Meschberger created SLING-2609:
----------------------------------------
Summary: Support non-ASCII based languages for node name generation
Key: SLING-2609
URL: https://issues.apache.org/jira/browse/SLING-2609
Project: Sling
Issue Type: Improvement
Components: Servlets
Affects Versions: Servlets Post 2.1.2
Reporter: Felix Meschberger
Assignee: Felix Meschberger
The Sling POST Servlet has built-in support to automatically generate names for
newly generated resources based of some name hint or the value of some select
properties.
Such name hints are filtered in a very crude way, though:
* the string is converted to lower case
* only ascii letters and digits supported
* non-accepted characters replaced by underscore (_)
This leads to the following problems:
* Non-BMP (surrogate) Unicode characters are converted to just underscore
* Words separated by whitespace (e.g. the title "My Brand new Page" are now
separated by underscore instead of dash (-) which may lead to indexing problems
(see http://www.youtube.com/watch?v=AQcSFsQyct8)
This all happens in the NodeNameFilter class.
I suggest we change this as follows:
* Operate on code points instead (int type) of just characters (char type)
* Accept all characters valid for JCR names. This is all Unicode characters
except { ., /, :, [, ], *, ', ", | }. These characters are replaced by
underscore
* Convert all white space characters (Character.isWhitespace(int)) by dash
* Convert all other characters to lower case (Character.toLowerCase(int))
* Consecutive dash and underscore characters folded into just one
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira