Hi list,

Given the following code,
import java.text.Normalizer;
...

        final Session session = ...

        final Repository rep = session.getRepository();
System.out.println(rep.getDescriptor("jcr.repository.name") + " " + rep.getDescriptor("jcr.repository.version"));

        final Node root = session.getRootNode();
        final String name = "föö";
System.out.println("Normalizer.isNormalized(name, Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, Normalizer.Form.NFC)); // true System.out.println("Normalizer.isNormalized(name, Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, Normalizer.Form.NFD)); // false
        root.addNode(name);
        session.save();

        final Node node1 = root.getNode(name);
        System.out.println("node1 = " + node1);
final Node node2 = root.getNode(Normalizer.normalize(name, Normalizer.Form.NFC));
        System.out.println("node2 = " + node2);
final Node node3 = root.getNode(Normalizer.normalize(name, Normalizer.Form.NFD)); // fails
        System.out.println("node3 = " + node3);

There's a good chance fetching node3 won't work. It might be dependent on the underlying os and database, but in the case of OSX and Derby, this fails. It's not that surprising, really, given that Normalizer.normalize(name, Normalizer.Form.NFC).equals(Normalizer.normalize(name, Normalizer.Form.NFD)) is NOT true.

Now, taking into account the fact that all sorts of clients will use a different Normalizing Form (Firefox seems to encode URL parameters with NFD, Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it be a safe bet to normalize all input at repository level ? Or do you consider this is something client applications should do ?

ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

Thanks for any tip, pointer, idea, feedback or reaction !

Cheers,

-greg


Reply via email to