[
https://issues.apache.org/jira/browse/JENA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796665#comment-13796665
]
André Lanka commented on JENA-524:
----------------------------------
We use the cache for node and block management. The cache is only used for the
non-transactional part of the files, namely the "good old tdb files".
By this, we can rely on that any written change is allowed to see by any later
read call. No need to be aware of transactions.
We use one instance of BlockMgrFastGlobalCache per file, independently of in
how many datasets used the blocks are used.
We use the instance as long as the store/the file is opened.
So, all we need for concurrency is the following:
1. The data structures of the cache are thread safe (we do use such structures).
2. Concurrent read to the same BlockMgr and the same blocks is allowed (as long
as no one else concurrently writes).
3. Concurrent write to the same BlockMgr is allowed, but not to the same block.
(This situation shouldn't occur at all by TDB's MRSW pattern).
4. Any write request to the BlockMgr has to be delivered _immediately_ to the
underlying BlockMgr with respect to 3.
As we synchonize in almost any public method of BlockMgrFastGlobalCache, we
serialize also read accesses at the moment.
Of course, this is a performance drawback, but I was a little bit anxious in
the beginning. ;-)
The synchronized keywords in BlockMgrFastGlobalCache can be replaced by using a
ReentrantRW-Lock to enable concurrent read access to the files.
> Global Cache for servers hosting a large number of TDB stores
> -------------------------------------------------------------
>
> Key: JENA-524
> URL: https://issues.apache.org/jira/browse/JENA-524
> Project: Apache Jena
> Issue Type: New Feature
> Components: TDB
> Affects Versions: TDB 0.10.1
> Reporter: André Lanka
> Priority: Minor
> Labels: patch
> Attachments: patch_hojoki_global_cache.txt
>
>
> Hello,
> we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011
> to implement a global cache shared over all TDB stores currently opened on a
> server. The motivation was that we need to have many TDB stores on a single
> machine to provide parallel write access to the different graphs. Our goal
> was to have more than 2000 stores on a single machine. As we have only 8GB of
> memory for the JVM we can't use appropriate sized local caches for each store.
> So, we decided to implement a global shared cache for both Nodes/NodeIDs and
> Blocks. We intensively tested our changes with the current TDB version 0.10.1
> since it came up and it works well. Currently we host more than 5000 stores
> on each server, containing more than a billion triples on each server (stored
> in round about 150-200 GB TDB data). The cache has a size of approximately
> 500 MB.
> We will be very happy if we can integrate our changes in the official tdb
> branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true).
> If this method is not called, the factories use the original NodeTableCache
> and the original BlockMgrCache. If it's called, our table and our manager is
> used. Of course, it has a some overhead, but at least it's possible to have
> this large number of stores on a single machine.
> We only tested it with FileMode.direct as we only use this mode (for smaller
> file sizes, and we know for sure when changes a written to disk -- important
> for our backup mechanism). The cache applies only to the big data files on
> disk, not to the journal files.
> I can provide a patch I created yesterday against the current snapshot
> version (I can't find a upload field in this "Create issue"-mask). The patch
> still contains a few tests that are merely Hojoki specific and it could need
> a few more general approaches (configuration by config files, instead of code
> constants and such things).
> Anyways, if you allow us to integrate our changes, I'll improve these parts.
> What do you think?
> Best wishes
> André
--
This message was sent by Atlassian JIRA
(v6.1#6144)