Hi All

Is there any way to detect when a TDB 2 database needs compaction?

Due to how updates arrive over time in our system via Kafka we do see quite 
large disk usage over time so we’re trying to automate periodic compactions of 
the databases to keep this under control.  Right now, we’re just compacting 
whenever our service (re)-starts using a Fuseki module to trigger this.  (Code 
for this is at 
https://github.com/telicent-oss/smart-cache-graph/blob/d79ec280e7d7cf210b8cdd88c326533e3f5eb20f/scg-system/src/main/java/io/telicent/core/FMod_InitialCompaction.java
 if anyone is interested)

But if a database is unchanged then a compaction, while relatively fast, is 
wholly unnecessary.  Ideally, we’d like to proactively monitor and detect when 
a database needs compaction, so we avoid these unnecessary compactions.

A crude approach we’ve used as part of our approach for now is to just count 
the size of the database directory on disk when the server starts and compare 
the original size with the current size and use that as an indicator that a 
compact might be needed.  This is obviously not ideal and doesn’t account for 
the fact that some of the database files are pre-allocated sparse files so 
their contents could have changed even if the size on disk hasn’t.

A similarly crude approach would be to count the number of quads in a database 
but that’s not necessarily a reliable indicator as the additions and deletes 
could balance out over time to leave the same number of quads (unlikely but 
possible).

I know TDB2 tracks the generation of the tree structures internally due to its 
use of MVCC data structures and how it provides transaction isolation, so one 
way might be to read that information somehow (if that’s even possible).  I 
assume that must be stored in one/more of the database files but not sure where 
to start looking for that, nor if that information were in a form that would 
allow detecting changes?

Anyone else automated TDB2 compactions in any way, or have any ideas on this 
they can share?

Cheers,

Rob Vesse

Reply via email to