[ 
https://issues.apache.org/jira/browse/SLING-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Meschberger updated SLING-2609:
-------------------------------------

    Attachment: SLING-2609.patch

Proposed patch:
  * Implements new filtering
  * Folds NodeNameFilter into DefaultNodeNameGenerator
                
> Support non-ASCII based languages for node name generation
> ----------------------------------------------------------
>
>                 Key: SLING-2609
>                 URL: https://issues.apache.org/jira/browse/SLING-2609
>             Project: Sling
>          Issue Type: Improvement
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.2
>            Reporter: Felix Meschberger
>            Assignee: Felix Meschberger
>         Attachments: NodeNameFilter.java, SLING-2609.patch
>
>
> The Sling POST Servlet has built-in support to automatically generate names 
> for newly generated resources based of some name hint or the value of some 
> select properties.
> Such name hints are filtered in a very crude way, though:
>   * the string is converted to lower case
>   * only ascii letters and digits supported
>   * non-accepted characters replaced by underscore (_)
> This leads to the following problems:
>   * Non-BMP (surrogate) Unicode characters are converted to just underscore
>   * Words separated by whitespace (e.g. the title "My Brand new Page" are now 
> separated by underscore instead of dash (-) which may lead to indexing 
> problems (see http://www.youtube.com/watch?v=AQcSFsQyct8)
> This all happens in the NodeNameFilter class.
> I suggest we change this as follows:
> * Operate on code points instead (int type) of just characters (char type)
> * Accept all characters valid for JCR names. This is all Unicode characters 
> except { ., /, :, [, ], *, ', ", | }. These characters are replaced by 
> underscore
> * Convert all white space characters (Character.isWhitespace(int)) by dash
> * Convert all other characters to lower case (Character.toLowerCase(int))
> * Consecutive dash and underscore characters folded into just one

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to