André Lanka created JENA-524:
--------------------------------
Summary: Global Cache for servers hosting a large number of TDB
stores
Key: JENA-524
URL: https://issues.apache.org/jira/browse/JENA-524
Project: Apache Jena
Issue Type: New Feature
Components: TDB
Affects Versions: TDB 0.10.1
Reporter: André Lanka
Priority: Minor
Fix For: TDB 0.10.2
Hello,
we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011 to
implement a global cache shared over all TDB stores currently opened on a
server. The motivation was that we need to have many TDB stores on a single
machine to provide parallel write access to the different graphs. Our goal was
to have more than 2000 stores on a single machine. As we have only 8GB of
memory for the JVM we can't use appropriate sized local caches for each store.
So, we decided to implement a global shared cache for both Nodes/NodeIDs and
Blocks. We intensively tested our changes with the current TDB version 0.10.1
since it came up and it works well. Currently we host more than 5000 stores on
each server, containing more than a billion triples on each server (stored in
round about 150-200 GB TDB data). The cache has a size of approximately 500 MB.
We will be very happy if we can integrate our changes in the official tdb
branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true).
If this method is not called, the factories use the original NodeTableCache and
the original BlockMgrCache. If it's called, our table and our manager is used.
Of course, it has a some overhead, but at least it's possible to have this
large number of stores on a single machine.
We only tested it with FileMode.direct as we only use this mode (for smaller
file sizes, and we know for sure when changes a written to disk -- important
for our backup mechanism). The cache applies only to the big data files on
disk, not to the journal files.
I can provide a patch I created yesterday against the current snapshot version
(I can't find a upload field in this "Create issue"-mask). The patch still
contains a few tests that are merely Hojoki specific and it could need a few
more general approaches (configuration by config files, instead of code
constants and such things).
Anyways, if you allow us to integrate our changes, I'll improve these parts.
What do you think?
Best wishes
André
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira