Hi All Is there any way to detect when a TDB 2 database needs compaction?
Due to how updates arrive over time in our system via Kafka we do see quite large disk usage over time so we’re trying to automate periodic compactions of the databases to keep this under control. Right now, we’re just compacting whenever our service (re)-starts using a Fuseki module to trigger this. (Code for this is at https://github.com/telicent-oss/smart-cache-graph/blob/d79ec280e7d7cf210b8cdd88c326533e3f5eb20f/scg-system/src/main/java/io/telicent/core/FMod_InitialCompaction.java if anyone is interested) But if a database is unchanged then a compaction, while relatively fast, is wholly unnecessary. Ideally, we’d like to proactively monitor and detect when a database needs compaction, so we avoid these unnecessary compactions. A crude approach we’ve used as part of our approach for now is to just count the size of the database directory on disk when the server starts and compare the original size with the current size and use that as an indicator that a compact might be needed. This is obviously not ideal and doesn’t account for the fact that some of the database files are pre-allocated sparse files so their contents could have changed even if the size on disk hasn’t. A similarly crude approach would be to count the number of quads in a database but that’s not necessarily a reliable indicator as the additions and deletes could balance out over time to leave the same number of quads (unlikely but possible). I know TDB2 tracks the generation of the tree structures internally due to its use of MVCC data structures and how it provides transaction isolation, so one way might be to read that information somehow (if that’s even possible). I assume that must be stored in one/more of the database files but not sure where to start looking for that, nor if that information were in a form that would allow detecting changes? Anyone else automated TDB2 compactions in any way, or have any ideas on this they can share? Cheers, Rob Vesse