1) storing of attachments (I haven't figured out yet whether it's better to store them in a separate workspace, because then you can leverage faster local filesystems instead of putting really big binaries into the database)

Probably better to store these in a second database -- sort of how, today, we allow you to use a different directory for attachments v. wikipages.

2) and more importantly, author names.

Now, we have in 2.8 a way to uniquely identify an author by an id number, allowing for author name changes. This is quite fine, but I'm now unsure what should be stored into the backend.

The ID associated with documents and revisions and whatnot should be the unique ID number. That's classic normal-form stuff.

Storing the id alone brings in the following problems:
* Imports/exports break, since the repo model would only export the ID, and there would be no binding of that to real identity

You'd rely on the user/group managers to tie the user identity back to the IDs.

* Since the id=>identity mapping is not done in the JCR backend, every getAuthor() (w/out cache) would cause multiple DB accesses.

Yeah, but caching isn't too hard...

* numeric ids are not necessarily available from the userdb backend (e.g. if you use LDAP or something similar), so they would be internal only - which means that if you export or access the content via other means, you would not be able to figure out the user.

The approach I've seen elsewhere is to have a place where you map the user IDs to the identifiers used on the "identity system of record," whether that be LDAP, a relational database or whatnot. This adds another level of indirection, of course, which sort of sucks, but it's really just one more table that would get stored in JCR.

One possibility would of course to be and ditch any custom User/ GroupDatabases and make them use the JCR backend, too. But that will tie them together for better or worse.

We probably should keep the interfaces the way the are, but make the default implementation ("JCRUserDatabase") use the JCR back-end. Do we keep the XML and JDBC implementations for those who want them, or maybe even get rid of them?

Another possibility would be to store both the id *and* the WikiName.

Probably not a good idea -- just another thing to have to keep consistent... the potential integrity conflicts could be nasty.

Reply via email to