[jira] Updated: (JCR-926) Global data store for binaries

Jukka Zitting (JIRA) Wed, 20 Jun 2007 14:12:46 -0700

     [ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jukka Zitting updated JCR-926:
------------------------------

    Attachment: ReadWhileSaveTest.patch

I made a few modifications to ReadWhileSaveTest to better illustrate the 
problem. See the attached patch that instead of saving a number of 10MB files 
saves a single 300MB file. It also keeps track of how many times the root node 
is traversed while the 300MB file is being persisted.

The raw output of a test run is below:

    Wed Jun 20 23:41:17 EEST 2007 - setProperty() - 0
    Wed Jun 20 23:41:39 EEST 2007 - begin save() - 195
    Wed Jun 20 23:42:05 EEST 2007 - end save() - 197
    numReads: 198

Essentially:

    * setProperty(): 22 seconds, during which 195 root node traversals happened
    * save():  26 seconds, during which 2 root node traversals happened

The two traversals reported for save() most likely happened between the 
println() and save() statements.

Observations:

1) Currently we create one extra copy of the binary stream, the write 
performance would essentially be doubled simply by removing that extra copy. 
The stream passed to setProperty should be given directly to the DataStore 
implementation so no extra copies are needed.

2) More alarmingly, this seems to indicate that the fine grained locking from 
JCR-314 does not work as well as it should, i.e. a save() still blocks readers. 
Note that I explicitly added a save() call after the "stuff" node is added to 
make sure that the write should not affect nodes that are being read. I ran the 
test against latest svn trunk, revision 549230. 


> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: DataStore.patch, DataStore2.patch, 
> ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large 
> binary values:
> 1) Persisting a large binary value blocks access to the persistence layer for 
> extended amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through 
> the JCR API: one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large 
> binary values can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we 
> implement a global "data store" concept in the repository. A data store is an 
> append-only set of binary values that uses short identifiers to identify and 
> access the stored binary values. The data store would trivially fit the 
> requirements of transient space and transaction handling due to the 
> append-only nature. An explicit mark-and-sweep garbage collection process 
> could be added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more 
> background on this idea.
> [1] 
> http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
> PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-926) Global data store for binaries

Reply via email to