Hi vadim

Vadim Gritsenko wrote:
Edgar,

Was trying to find more information following your references, but...

[1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435


Points to JIRA which states [1]:

   Comment by Edgar Poce [12/Jul/05 06:00 AM]
   This kind of approach is discouraged by design

Can you please clarify your point?

There are a couple of conversations in the archive about this. My point is that the PM contract is not suitable for mapping the itemstates into a relational database with a table design that breaks the ItemState into its constituent parts. The PM is intended to keep it simple, which means to store the itemstate as a whole without interpreting the data. See the jdbc pm under contrib.

The main problem to store the itemstates in a complex schema is the Collection handling. Since Collection fields changes are not logged into add/update/remove aware objects, all the elements in the Collection must be stored on each write call. It causes a hit on performance when handling collections with lots of elements, even with the simple PMs included in the core.

see the second chart in http://issues.apache.org/jira/browse/JCR-188. In my PIV box with Object PM + cqfs, any write operation (e.g. set a property) takes up to half a sec when the given node reaches 3k children. If I tried to run the same test with the impl proposed in jcr-91, the half sec mark would be reached much sooner than with 3k children, just a hundred children would make the repo unbearably slow.

when I decided to write the jdbc pm proposed in jcr-91 I wanted:

1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.

But in order to achieve the above goals the PM should interpret the data :(. Maybe we can bring this up again after the first release ...

> Or, may be point to the document /
> discussion regarding the design?
>
Even when it's not directly related you might want to take a look to the Dominique's post about jackrabbit internals. See http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/1223

[2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

Points to Wiki page which does not clarify your POV either.
It's not my point of view. I just collected the devs opinions on this issue from the mailing list. If it's not clear please trace the conversations in the archive and clarify it.

> It states though:

   The PM interface was never intended as being a general SPI that
   you could implement in order to integrate external datasources
   with proprietary formats (e.g. a customers database).

This raises the question, what is the recommended SPI to code against?

I think that the jcr-ext project under contrib might be a good starting point. Or, despite the PM is not intended to be a SPI, you can handle to plug your legacy data if you do it carefully.


PS Wiki page has incorrect statement:

    XML PersistenceManager
      * Write operations are synchronized

AFAICS, XML PM (unnecessarily) syncronizes all calls, including load() and exist() calls.
Why incorrect? maybe incomplete...

> Does it mean FileSystem interface considered to be
single threaded?
I don't think so

> Does not make much sense, though...

I agree. I think that the concurrency issue was handled first at the SHISM level, then it was moved to the PM, and then back to the SHISM (see http://issues.apache.org/jira/browse/JCR-164). Those synchronized modifiers seem to be there because the PM contract is not very clear yet, at least for me :(.

br,
edgar

Thanks,
Vadim

[1] http://issues.apache.org/jira/browse/JCR-91#action_12315534


Reply via email to