Re: JCR & thesis

Andreas Hartmann Mon, 31 Mar 2008 14:50:08 -0700

Jukka Zitting schrieb:

[...]

 - SQL query speed comparison with MySQL/PostgreSQL
 - read/write comparisons with filesystems


I'm sure that Jackrabbit will lose on both of those comparisons. The
main benefit in using a JCR content repository comes not from
duplicating content structures found in existing storage models, but
in going beyond their current limitations.

I'm not sure if this is covered by the spec, but it is possible to querya pre-selected set of nodes (e.g., a subtree or the direct children of anode)? In this case I could imagine that Jackrabbit might be a lotfaster than an RDBMS.

Regarding the file system - doesn't that depend on the cache settings? Icould imagine that Jackrabbit offers more - or at least easieraccessible - cache configurations, based on node types etc. But I haveto admit I haven't looked at the feature list for quite a long time, soI'd better catch up before asking more silly questions :)

For example any non-trivial RDBMS application requires a number of
joins that can easily become quite expensive. Standard JCR doesn't
event support joins as a query concept, but the tree hierarchy gives
1-n relationships and thus many 1-n joins essentially for free. Thus
I'd not compare the raw query performance between a relational
database and a content repository, but rather the higher level
performance for selected used cases based on a content model that's
designed to best leverage the capabilities of the underlying system.

That sounds very reasonable. As a CMS developer, I'd be very interestedin usecases like these:

Find all documents with type="image" and the keyword list (multi-valueproperty) contains "Spring" and "Flower" and the width is between 500and 600px. That's a typical query in the asset management.


Find all documents containing the XPath
//a[local-name() = 'xhtml' and namespace-uri = 'http://...' and
starts-with(@href,'lenya-document:c2c38f30-ff68-11dc-9682-9dea3e2477d4)]

That would be typical to find links that would be broken after adocument is removed from the live site. I know that JCR doesn't supportthis directly - I guess this is where XML DBs shine. With JCR, is itnecessary to traverse all documents and query the content using XPath,or is there a better solution?

The same goes for JCR versus the file system. Most non-trivial
applications that use the file system for storage end up using XML
files, or other parsed resources for handling fine-grained content
like individual dates, strings, numbers, etc. A content repository
natively supports such fine-grained content, so many read and update
operations that target such "small data" are much more convenient and
ofter faster than in a file-based solution that requires explicit
parsing and serializing (not to mention locking) of larger chunks of
data.

In Lenya, we use XML files + Lucene for content and meta data indexing.Finding broken links (see above) is rather slow. Meta data queries arequite fast. I'd be very interested how this would change with Jackrabbit.

Martin, if you'd like to consider including the Lenya repository in yourcomparison, I'd try to assist if I find the time.



Jukka, thanks a lot for your valuable comments,

-- Andreas



--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

Re: JCR & thesis

Reply via email to