[jira] [Commented] (JENA-524) Global Cache for servers hosting a large number of TDB stores

Andy Seaborne (JIRA) Tue, 17 Sep 2013 02:50:05 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769372#comment-13769372
 ]


Andy Seaborne commented on JENA-524:
------------------------------------

Within TDB, long and Long both get used.  It is likely that object map keys 
will win out because Long is too short for some ids in TDB in the long term.  
We can use the existing cache code or Google's guava cache code and focus on 
the cache design for now.
                
> Global Cache for servers hosting a large number of TDB stores
> -------------------------------------------------------------
>
>                 Key: JENA-524
>                 URL: https://issues.apache.org/jira/browse/JENA-524
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: TDB
>    Affects Versions: TDB 0.10.1
>            Reporter: André Lanka
>            Priority: Minor
>              Labels: patch
>             Fix For: TDB 1.0.0
>
>         Attachments: patch_hojoki_global_cache.txt
>
>
> Hello,
> we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011 
> to implement a global cache shared over all TDB stores currently opened on a 
> server. The motivation was that we need to have many TDB stores on a single 
> machine to provide parallel write access to the different graphs. Our goal 
> was to have more than 2000 stores on a single machine. As we have only 8GB of 
> memory for the JVM we can't use appropriate sized local caches for each store.
> So, we decided to implement a global shared cache for both Nodes/NodeIDs and 
> Blocks. We intensively tested our changes with the current TDB version 0.10.1 
> since it came up and it works well. Currently we host more than 5000 stores 
> on each server, containing more than a billion triples on each server (stored 
> in round about 150-200 GB TDB data). The cache has a size of approximately 
> 500 MB.
> We will be very happy if we can integrate our changes in the official tdb 
> branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true). 
> If this method is not called, the factories use the original NodeTableCache 
> and the original BlockMgrCache. If it's called, our table and our manager is 
> used. Of course, it has a some overhead, but at least it's possible to have 
> this large number of stores on a single machine.
> We only tested it with FileMode.direct as we only use this mode (for smaller 
> file sizes, and we know for sure when changes a written to disk -- important 
> for our backup mechanism). The cache applies only to the big data files on 
> disk, not to the journal files.
> I can provide a patch I created yesterday against the current snapshot 
> version (I can't find a upload field in this "Create issue"-mask). The patch 
> still contains a few tests that are merely Hojoki specific and it could need 
> a few more general approaches (configuration by config files, instead of code 
> constants and such things).
> Anyways, if you allow us to integrate our changes, I'll improve these parts.
> What do you think?
> Best wishes
> André

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-524) Global Cache for servers hosting a large number of TDB stores

Reply via email to