On 2013-10-08 22:13, Jukka Zitting wrote:
Hi,

On Tue, Oct 8, 2013 at 11:45 AM, Julian Reschke <[email protected]> wrote:
And these arbitrary keys really require that two different normalization
forms remain different?

I'm afraid they probably do.

While it's unlikely for unnormalized data to be used too frequently in
practice, someone could still easily craft a request or a piece of
content that could confuse code that doesn't expect the repository to
do auto-normalization. Another potentially troublesome example are
in-memory caches and other data structures that use paths as keys and
could thus be circumvented or potentially polluted with invalid data
if we relax path semantics. And yet another is the practice of
avoiding a too flat content hierarchy by distributing content across
subtrees based on the first few characters of a node name, which could
lead to lost, misplaced or duplicated content depending on how the
hierarchy is accessed.

The use case are real-world users that mix platforms (Windows, Mac) and
browsers (Webkit vs the rest) and end up with two nodes where there should
be only one.

And no, it would need to be done consistently (file upload through browser,
WebDAV access, other HTTP based APIs, etc), and thus would be very hard to
do all over the place.

Right, but it still would be doable on that level without potentially
compromising clients that use JCR directly. Combined with a
repository-level validation mechanism that rejects non-normalized
content (or content that after normalization would conflict with
existing content), we could still catch cases where such higher-level
processing hasn't been applied and prevent those from causing trouble.

That sounds like you propose to do the normalization-on-lookup one layer above the JCR API. Won't that be extremely expensive?

I wonder whether we could make normalization (or lack of it) depend on a mixin 
type?

Another potential solution might be to make such behavior
session-specific. An extra session attribute could be used to enable
auto-normalization just for that session. Clients that expect
filesystem semantics could use that option, while existing
database-oriented clients wouldn't have to worry about such things
(apart from the potential validation errors).

That's an interesting suggestion...


Reply via email to