Hi list,
Given the following code,
import java.text.Normalizer;
...
final Session session = ...
final Repository rep = session.getRepository();
System.out.println(rep.getDescriptor("jcr.repository.name") +
" " + rep.getDescriptor("jcr.repository.version"));
final Node root = session.getRootNode();
final String name = "föö";
System.out.println("Normalizer.isNormalized(name,
Normalizer.Form.NFC) = " + Normalizer.isNormalized(name,
Normalizer.Form.NFC)); // true
System.out.println("Normalizer.isNormalized(name,
Normalizer.Form.NFD) = " + Normalizer.isNormalized(name,
Normalizer.Form.NFD)); // false
root.addNode(name);
session.save();
final Node node1 = root.getNode(name);
System.out.println("node1 = " + node1);
final Node node2 = root.getNode(Normalizer.normalize(name,
Normalizer.Form.NFC));
System.out.println("node2 = " + node2);
final Node node3 = root.getNode(Normalizer.normalize(name,
Normalizer.Form.NFD)); // fails
System.out.println("node3 = " + node3);
There's a good chance fetching node3 won't work. It might be dependent
on the underlying os and database, but in the case of OSX and Derby,
this fails. It's not that surprising, really, given that
Normalizer.normalize(name,
Normalizer.Form.NFC).equals(Normalizer.normalize(name,
Normalizer.Form.NFD)) is NOT true.
Now, taking into account the fact that all sorts of clients will use a
different Normalizing Form (Firefox seems to encode URL parameters
with NFD, Safari with NFC; linux NFC, OSX finder seems to favor NFD),
wouldn't it be a safe bet to normalize all input at repository level ?
Or do you consider this is something client applications should do ?
ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
Thanks for any tip, pointer, idea, feedback or reaction !
Cheers,
-greg