I don't know of a library or built-in that would handle that. But you
could write one. If you do, try to release the source.
In another direction, one fairly cheap solution might be to check the
user-name before creating the user, or try-catch the library call. If it
looks iffy or fails, apologize to the user and ask them to asciify it in
their own preferred way. That way there are no surprises when the user sees
møøse automatically (and irrevocably?) translated to moeoese or moose.
People are sometimes very sensitive about these things, so either variant
might annoy someone.
A pre-flight check could use fn:matches with the pattern from security.xsd:
<xs:simpleType name="user-name">
<xs:annotation>
<xs:documentation>
</xs:documentation>
<xs:appinfo>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern value="[a-zA-Z0-9._@-]+"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
Longer term you could ask MarkLogic to expand that pattern to cover more
languages. It's only a matter of time before more users start wanting to
use non-ASCII scripts for usernames. I'm not sure if there's any technical
reason for the restriction. Using HTTP auth means user-id can't contain a
colon ':', but otherwise I believe anything goes. Of course browsers might
not support everything, and I'm not sure about LDAP, NTLM, etc.
-- Mike
On 1 Aug 2014, at 07:05 , David Sewell <[email protected]> wrote:
We have a user-facing function that creates login names based on their
real names, using initial characters from their surname and last name to
create a MarkLogic user name. For the first time we recorded a server error
when someone registered with a name beginning with a non-ASCII character
(Norwegian Ø), because currently MarkLogic username cannot have non-ASCII
characters.
So I thought the easy solution would be to use xdmp:diacritic-less().
But no, that only changes characters like ñ and é that are accented
variants of a single letter. It does not touch combined charaters like Ø or
Æ.
Of course I could use fn:translate to catch all of the likely cases, but
is there a more general-purpose standard or extension function to perform
normalization to ASCII for accented/combined Latin characters in a
MarkLogic environment?
David S.
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected] Tel: +1 434 924 9973
Web:
http://rotunda.upress.virginia.edu/_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general