[jira] [Commented] (JENA-524) Global Cache for servers hosting a large number of TDB stores

JIRA Thu, 21 Nov 2013 12:58:30 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828882#comment-13828882
 ]


André Lanka commented on JENA-524:
----------------------------------

Sorry Andy, I missed your comments....

It's faster to compare only a single long value than using equals/hashCode for 
two objects (file ref, blockId). This is the reason why we map the path and the 
blockid to one long value. Of course it's a drawback to put it all to a 
Long-based Map later (because of (un)boxing the long values). As we plan to 
introduce a long-based (not Long-based) LinkedHashMap, we could profit even 
more from the single-long-approach.

Yes, the mappers (FilenameAndBlockIDLongMapper and 
FilenameAndFilePosLongMapper) are essentially the same. The only difference are 
the bits for the first/second part of the long value. The reason is that the 
Block-mapper has to manage more files with less block numbers (each is 8kB) 
whereas the ID-Mapper (for the node IDs) has to manage less files but larger 
values for the position in file. Of course a single class with parametrized bit 
values could also handle this. Yet, the access of a public static final 
variable is way faster than a instance-dependent variable. This is why I used 
this slightly irritating approach.

> Global Cache for servers hosting a large number of TDB stores
> -------------------------------------------------------------
>
>                 Key: JENA-524
>                 URL: https://issues.apache.org/jira/browse/JENA-524
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: TDB
>    Affects Versions: TDB 0.10.1
>            Reporter: André Lanka
>            Priority: Minor
>              Labels: patch
>         Attachments: patch_hojoki_global_cache.txt
>
>
> Hello,
> we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011 
> to implement a global cache shared over all TDB stores currently opened on a 
> server. The motivation was that we need to have many TDB stores on a single 
> machine to provide parallel write access to the different graphs. Our goal 
> was to have more than 2000 stores on a single machine. As we have only 8GB of 
> memory for the JVM we can't use appropriate sized local caches for each store.
> So, we decided to implement a global shared cache for both Nodes/NodeIDs and 
> Blocks. We intensively tested our changes with the current TDB version 0.10.1 
> since it came up and it works well. Currently we host more than 5000 stores 
> on each server, containing more than a billion triples on each server (stored 
> in round about 150-200 GB TDB data). The cache has a size of approximately 
> 500 MB.
> We will be very happy if we can integrate our changes in the official tdb 
> branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true). 
> If this method is not called, the factories use the original NodeTableCache 
> and the original BlockMgrCache. If it's called, our table and our manager is 
> used. Of course, it has a some overhead, but at least it's possible to have 
> this large number of stores on a single machine.
> We only tested it with FileMode.direct as we only use this mode (for smaller 
> file sizes, and we know for sure when changes a written to disk -- important 
> for our backup mechanism). The cache applies only to the big data files on 
> disk, not to the journal files.
> I can provide a patch I created yesterday against the current snapshot 
> version (I can't find a upload field in this "Create issue"-mask). The patch 
> still contains a few tests that are merely Hojoki specific and it could need 
> a few more general approaches (configuration by config files, instead of code 
> constants and such things).
> Anyways, if you allow us to integrate our changes, I'll improve these parts.
> What do you think?
> Best wishes
> André



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (JENA-524) Global Cache for servers hosting a large number of TDB stores

Reply via email to