Felix Meschberger created SLING-2609:
----------------------------------------

             Summary: Support non-ASCII based languages for node name generation
                 Key: SLING-2609
                 URL: https://issues.apache.org/jira/browse/SLING-2609
             Project: Sling
          Issue Type: Improvement
          Components: Servlets
    Affects Versions: Servlets Post 2.1.2
            Reporter: Felix Meschberger
            Assignee: Felix Meschberger


The Sling POST Servlet has built-in support to automatically generate names for 
newly generated resources based of some name hint or the value of some select 
properties.

Such name hints are filtered in a very crude way, though:
  * the string is converted to lower case
  * only ascii letters and digits supported
  * non-accepted characters replaced by underscore (_)

This leads to the following problems:
  * Non-BMP (surrogate) Unicode characters are converted to just underscore
  * Words separated by whitespace (e.g. the title "My Brand new Page" are now 
separated by underscore instead of dash (-) which may lead to indexing problems 
(see http://www.youtube.com/watch?v=AQcSFsQyct8)

This all happens in the NodeNameFilter class.

I suggest we change this as follows:

* Operate on code points instead (int type) of just characters (char type)
* Accept all characters valid for JCR names. This is all Unicode characters 
except { ., /, :, [, ], *, ', ", | }. These characters are replaced by 
underscore
* Convert all white space characters (Character.isWhitespace(int)) by dash
* Convert all other characters to lower case (Character.toLowerCase(int))
* Consecutive dash and underscore characters folded into just one

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to