Andy Seaborne created JENA-1379:

             Summary: Replace TDB NodeTableTrans
                 Key: JENA-1379
             Project: Apache Jena
          Issue Type: Bug
          Components: TDB
    Affects Versions: Jena 3.4.0
            Reporter: Andy Seaborne
            Assignee: Andy Seaborne

TDB {{NodeTableTrans}} is complicated. It combines an existing {{NodeTable}} 
with an additional index (often in-memory) and a journal-like {{ObjectFile}} to 
hold new nodes added in a transaction. It has to maintain a mapping between the 
new nodes in the journal-ObjectFile and the eventual location on the main node 
file. On commit, it writes the journal-ObjectFile nodes to underlying index. 
There is a problem that writing the index isn't done completely safely. The 
window of vulnerability is quite small though (coordinating the index update 
and the object file update).

{{NodeTableBuilder}} is part of the way TDB datasets get built. A simpler 
design is to make {{NodeTable}}s be built from the basic components on 
`BlockMgr`s and `ObjectFile`s (the two units of storage in TDB) in a fixed 
fashion. The potential flexibility of the current design has never been 

There are two parts to this change: they are independent.

# a transactional index (based on the same machinery as the tuple indexes) and 
directly appending to the object file of the {{NodeTable}}.
# independent transactional object file.

Directly appending is safe because these files only grow. Only nodes in the 
associated index are accessible.  Abort resets the append point; a crash during 
a write transaction can, at worst, create unused junk in the object file but 
this is a trade-off of speed and recovery. A journalled addition object file 
would avoid junk in some crash situations, though it imposes a copy cost. It is 
proposed to go for simple+speed. "Simpler" is easier to make crash-safe.

The alternative here is not to keep the existing code - there is some unused 
(and hence no deployment-tested) code in {{ObjectFileTransComplex}} (working 
name) for a more complicated journalled object file.

The on-disk format is not changed except that existing (up to Jena 3.4.0) 
"dat-jrnl" files do not exist. Presence of indicates crash recovery is needed. 
The safest way is to require that recovery is done with the same version of TDB 
with a test in new code that notices and exist if it encounters old files. 
Oddly, old code should recover new version datasets correctly! All the work has 
been moved the the main index journal.

This message was sent by Atlassian JIRA

Reply via email to