ASF GitHub Bot commented on JENA-1379:

GitHub user afs opened a pull request:


    JENA-1379: Better (simpler, more robust) transactional NodeTables

    See [JENA-1379](https://issues.apache.org/jira/browse/JENA-1379) for more 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afs/jena tdb-nodetable-txn

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #272
commit e6e1b16aaca2c433120d61f2d7ad4edaaa1e22cf
Author: Andy Seaborne <a...@apache.org>
Date:   2017-08-04T16:14:33Z

    Build from ObjectFiles and BlockMgrs.
    Remove NodeTableBuilder
    Remove NodeTableTrans


> Replace TDB NodeTableTrans
> --------------------------
>                 Key: JENA-1379
>                 URL: https://issues.apache.org/jira/browse/JENA-1379
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 3.4.0
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
> TDB {{NodeTableTrans}} is complicated. It combines an existing {{NodeTable}} 
> with an additional index (often in-memory) and a journal-like {{ObjectFile}} 
> to hold new nodes added in a transaction. It has to maintain a mapping 
> between the new nodes in the journal-ObjectFile and the eventual location on 
> the main node file. On commit, it writes the journal-ObjectFile nodes to 
> underlying index. There is a problem that writing the index isn't done 
> completely safely. The window of vulnerability is quite small though 
> (coordinating the index update and the object file update).
> {{NodeTableBuilder}} is part of the way TDB datasets get built. A simpler 
> design is to make {{NodeTable}}s be built from the basic components on 
> `BlockMgr`s and `ObjectFile`s (the two units of storage in TDB) in a fixed 
> fashion. The potential flexibility of the current design has never been 
> exploited.
> There are two parts to this change: they are independent.
> # a transactional index (based on the same machinery as the tuple indexes) 
> and directly appending to the object file of the {{NodeTable}}.
> # independent transactional object file.
> Directly appending is safe because these files only grow. Only nodes in the 
> associated index are accessible.  Abort resets the append point; a crash 
> during a write transaction can, at worst, create unused junk in the object 
> file but this is a trade-off of speed and recovery. A journalled addition 
> object file would avoid junk in some crash situations, though it imposes a 
> copy cost. It is proposed to go for simple+speed. "Simpler" is easier to make 
> crash-safe.
> The alternative here is not to keep the existing code - there is some unused 
> (and hence no deployment-tested) code in {{ObjectFileTransComplex}} (working 
> name) for a more complicated journalled object file.
> The on-disk format is not changed except that existing (up to Jena 3.4.0) 
> "dat-jrnl" files do not exist. Presence of indicates crash recovery is 
> needed. The safest way is to require that recovery is done with the same 
> version of TDB with a test in new code that notices and exist if it 
> encounters old files. Oddly, old code should recover new version datasets 
> correctly! All the work has been moved the the main index journal.

This message was sent by Atlassian JIRA

Reply via email to