Hi vadim
Vadim Gritsenko wrote:
Edgar,
Was trying to find more information following your references, but...
[1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435
Points to JIRA which states [1]:
Comment by Edgar Poce [12/Jul/05 06:00 AM]
This kind of approach is discouraged by design
Can you please clarify your point?
There are a couple of conversations in the archive about this. My point
is that the PM contract is not suitable for mapping the itemstates into
a relational database with a table design that breaks the ItemState into
its constituent parts. The PM is intended to keep it simple, which means
to store the itemstate as a whole without interpreting the data. See the
jdbc pm under contrib.
The main problem to store the itemstates in a complex schema is the
Collection handling. Since Collection fields changes are not logged into
add/update/remove aware objects, all the elements in the Collection must
be stored on each write call. It causes a hit on performance when
handling collections with lots of elements, even with the simple PMs
included in the core.
see the second chart in http://issues.apache.org/jira/browse/JCR-188. In
my PIV box with Object PM + cqfs, any write operation (e.g. set a
property) takes up to half a sec when the given node reaches 3k children.
If I tried to run the same test with the impl proposed in jcr-91, the
half sec mark would be reached much sooner than with 3k children, just a
hundred children would make the repo unbearably slow.
when I decided to write the jdbc pm proposed in jcr-91 I wanted:
1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.
But in order to achieve the above goals the PM should interpret the data
:(. Maybe we can bring this up again after the first release ...
> Or, may be point to the document /
> discussion regarding the design?
>
Even when it's not directly related you might want to take a look to the
Dominique's post about jackrabbit internals. See
http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/1223
[2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
Points to Wiki page which does not clarify your POV either.
It's not my point of view. I just collected the devs opinions on this
issue from the mailing list. If it's not clear please trace the
conversations in the archive and clarify it.
> It states though:
The PM interface was never intended as being a general SPI that
you could implement in order to integrate external datasources
with proprietary formats (e.g. a customers database).
This raises the question, what is the recommended SPI to code against?
I think that the jcr-ext project under contrib might be a good starting
point. Or, despite the PM is not intended to be a SPI, you can handle to
plug your legacy data if you do it carefully.
PS Wiki page has incorrect statement:
XML PersistenceManager
* Write operations are synchronized
AFAICS, XML PM (unnecessarily) syncronizes all calls, including load()
and exist() calls.
Why incorrect? maybe incomplete...
> Does it mean FileSystem interface considered to be
single threaded?
I don't think so
> Does not make much sense, though...
I agree. I think that the concurrency issue was handled first at the
SHISM level, then it was moved to the PM, and then back to the SHISM
(see http://issues.apache.org/jira/browse/JCR-164). Those synchronized
modifiers seem to be there because the PM contract is not very clear
yet, at least for me :(.
br,
edgar
Thanks,
Vadim
[1] http://issues.apache.org/jira/browse/JCR-91#action_12315534