Jukka Zitting schrieb:
[...]
- SQL query speed comparison with MySQL/PostgreSQL
- read/write comparisons with filesystems
I'm sure that Jackrabbit will lose on both of those comparisons. The
main benefit in using a JCR content repository comes not from
duplicating content structures found in existing storage models, but
in going beyond their current limitations.
I'm not sure if this is covered by the spec, but it is possible to query
a pre-selected set of nodes (e.g., a subtree or the direct children of a
node)? In this case I could imagine that Jackrabbit might be a lot
faster than an RDBMS.
Regarding the file system - doesn't that depend on the cache settings? I
could imagine that Jackrabbit offers more - or at least easier
accessible - cache configurations, based on node types etc. But I have
to admit I haven't looked at the feature list for quite a long time, so
I'd better catch up before asking more silly questions :)
For example any non-trivial RDBMS application requires a number of
joins that can easily become quite expensive. Standard JCR doesn't
event support joins as a query concept, but the tree hierarchy gives
1-n relationships and thus many 1-n joins essentially for free. Thus
I'd not compare the raw query performance between a relational
database and a content repository, but rather the higher level
performance for selected used cases based on a content model that's
designed to best leverage the capabilities of the underlying system.
That sounds very reasonable. As a CMS developer, I'd be very interested
in usecases like these:
Find all documents with type="image" and the keyword list (multi-value
property) contains "Spring" and "Flower" and the width is between 500
and 600px. That's a typical query in the asset management.
Find all documents containing the XPath
//a[local-name() = 'xhtml' and namespace-uri = 'http://...' and
starts-with(@href,'lenya-document:c2c38f30-ff68-11dc-9682-9dea3e2477d4)]
That would be typical to find links that would be broken after a
document is removed from the live site. I know that JCR doesn't support
this directly - I guess this is where XML DBs shine. With JCR, is it
necessary to traverse all documents and query the content using XPath,
or is there a better solution?
The same goes for JCR versus the file system. Most non-trivial
applications that use the file system for storage end up using XML
files, or other parsed resources for handling fine-grained content
like individual dates, strings, numbers, etc. A content repository
natively supports such fine-grained content, so many read and update
operations that target such "small data" are much more convenient and
ofter faster than in a file-based solution that requires explicit
parsing and serializing (not to mention locking) of larger chunks of
data.
In Lenya, we use XML files + Lucene for content and meta data indexing.
Finding broken links (see above) is rather slow. Meta data queries are
quite fast. I'd be very interested how this would change with Jackrabbit.
Martin, if you'd like to consider including the Lenya repository in your
comparison, I'd try to assist if I find the time.
Jukka, thanks a lot for your valuable comments,
-- Andreas
--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01