[
https://issues.apache.org/jira/browse/JENA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115436#comment-16115436
]
ASF GitHub Bot commented on JENA-1379:
--------------------------------------
GitHub user afs opened a pull request:
https://github.com/apache/jena/pull/272
JENA-1379: Better (simpler, more robust) transactional NodeTables
See [JENA-1379](https://issues.apache.org/jira/browse/JENA-1379) for more
details.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/afs/jena tdb-nodetable-txn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/272.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #272
----
commit e6e1b16aaca2c433120d61f2d7ad4edaaa1e22cf
Author: Andy Seaborne <[email protected]>
Date: 2017-08-04T16:14:33Z
Build from ObjectFiles and BlockMgrs.
Remove NodeTableBuilder
Remove NodeTableTrans
----
> Replace TDB NodeTableTrans
> --------------------------
>
> Key: JENA-1379
> URL: https://issues.apache.org/jira/browse/JENA-1379
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB
> Affects Versions: Jena 3.4.0
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
>
> TDB {{NodeTableTrans}} is complicated. It combines an existing {{NodeTable}}
> with an additional index (often in-memory) and a journal-like {{ObjectFile}}
> to hold new nodes added in a transaction. It has to maintain a mapping
> between the new nodes in the journal-ObjectFile and the eventual location on
> the main node file. On commit, it writes the journal-ObjectFile nodes to
> underlying index. There is a problem that writing the index isn't done
> completely safely. The window of vulnerability is quite small though
> (coordinating the index update and the object file update).
> {{NodeTableBuilder}} is part of the way TDB datasets get built. A simpler
> design is to make {{NodeTable}}s be built from the basic components on
> `BlockMgr`s and `ObjectFile`s (the two units of storage in TDB) in a fixed
> fashion. The potential flexibility of the current design has never been
> exploited.
> There are two parts to this change: they are independent.
> # a transactional index (based on the same machinery as the tuple indexes)
> and directly appending to the object file of the {{NodeTable}}.
> # independent transactional object file.
> Directly appending is safe because these files only grow. Only nodes in the
> associated index are accessible. Abort resets the append point; a crash
> during a write transaction can, at worst, create unused junk in the object
> file but this is a trade-off of speed and recovery. A journalled addition
> object file would avoid junk in some crash situations, though it imposes a
> copy cost. It is proposed to go for simple+speed. "Simpler" is easier to make
> crash-safe.
> The alternative here is not to keep the existing code - there is some unused
> (and hence no deployment-tested) code in {{ObjectFileTransComplex}} (working
> name) for a more complicated journalled object file.
> The on-disk format is not changed except that existing (up to Jena 3.4.0)
> "dat-jrnl" files do not exist. Presence of indicates crash recovery is
> needed. The safest way is to require that recovery is done with the same
> version of TDB with a test in new code that notices and exist if it
> encounters old files. Oddly, old code should recover new version datasets
> correctly! All the work has been moved the the main index journal.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)