André Lanka created JENA-524:
--------------------------------

             Summary: Global Cache for servers hosting a large number of TDB 
stores
                 Key: JENA-524
                 URL: https://issues.apache.org/jira/browse/JENA-524
             Project: Apache Jena
          Issue Type: New Feature
          Components: TDB
    Affects Versions: TDB 0.10.1
            Reporter: André Lanka
            Priority: Minor
             Fix For: TDB 0.10.2


Hello,

we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011 to 
implement a global cache shared over all TDB stores currently opened on a 
server. The motivation was that we need to have many TDB stores on a single 
machine to provide parallel write access to the different graphs. Our goal was 
to have more than 2000 stores on a single machine. As we have only 8GB of 
memory for the JVM we can't use appropriate sized local caches for each store.

So, we decided to implement a global shared cache for both Nodes/NodeIDs and 
Blocks. We intensively tested our changes with the current TDB version 0.10.1 
since it came up and it works well. Currently we host more than 5000 stores on 
each server, containing more than a billion triples on each server (stored in 
round about 150-200 GB TDB data). The cache has a size of approximately 500 MB.

We will be very happy if we can integrate our changes in the official tdb 
branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true). 
If this method is not called, the factories use the original NodeTableCache and 
the original BlockMgrCache. If it's called, our table and our manager is used. 
Of course, it has a some overhead, but at least it's possible to have this 
large number of stores on a single machine.

We only tested it with FileMode.direct as we only use this mode (for smaller 
file sizes, and we know for sure when changes a written to disk -- important 
for our backup mechanism). The cache applies only to the big data files on 
disk, not to the journal files.

I can provide a patch I created yesterday against the current snapshot version 
(I can't find a upload field in this "Create issue"-mask). The patch still 
contains a few tests that are merely Hojoki specific and it could need a few 
more general approaches (configuration by config files, instead of code 
constants and such things).

Anyways, if you allow us to integrate our changes, I'll improve these parts.

What do you think?

Best wishes
André

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to