1) storing of attachments (I haven't figured out yet whether it's
better to store them in a separate workspace, because then you can
leverage faster local filesystems instead of putting really big
binaries into the database)
Probably better to store these in a second database -- sort of how,
today, we allow you to use a different directory for attachments v.
wikipages.
2) and more importantly, author names.
Now, we have in 2.8 a way to uniquely identify an author by an id
number, allowing for author name changes. This is quite fine, but
I'm now unsure what should be stored into the backend.
The ID associated with documents and revisions and whatnot should be
the unique ID number. That's classic normal-form stuff.
Storing the id alone brings in the following problems:
* Imports/exports break, since the repo model would only export the
ID, and there would be no binding of that to real identity
You'd rely on the user/group managers to tie the user identity back to
the IDs.
* Since the id=>identity mapping is not done in the JCR backend,
every getAuthor() (w/out cache) would cause multiple DB accesses.
Yeah, but caching isn't too hard...
* numeric ids are not necessarily available from the userdb backend
(e.g. if you use LDAP or something similar), so they would be
internal only - which means that if you export or access the content
via other means, you would not be able to figure out the user.
The approach I've seen elsewhere is to have a place where you map the
user IDs to the identifiers used on the "identity system of record,"
whether that be LDAP, a relational database or whatnot. This adds
another level of indirection, of course, which sort of sucks, but it's
really just one more table that would get stored in JCR.
One possibility would of course to be and ditch any custom User/
GroupDatabases and make them use the JCR backend, too. But that
will tie them together for better or worse.
We probably should keep the interfaces the way the are, but make the
default implementation ("JCRUserDatabase") use the JCR back-end. Do we
keep the XML and JDBC implementations for those who want them, or
maybe even get rid of them?
Another possibility would be to store both the id *and* the WikiName.
Probably not a good idea -- just another thing to have to keep
consistent... the potential integrity conflicts could be nasty.