On 14/10/2016 17:09, "Andy Seaborne" <[email protected]> wrote:
I don't understand what capabilities are enabled by transaction
granularity if there are multiple transactions in a single patch.
Concrete examples of where it helps?
However, I've normally been working with one transaction per patch anyway.
Allowing multiple transaction per patch is for making a collect of
(semantically) related changes into a unit, by consolidating small
patches "today's changes " (c.f. git squash).
Leaving the transaction boundaries in gives internal checkpoints, not
just one big transaction. It also makes the consolidate patch
decomposable (unlike squash).
Internal checkpoints are useful not just for keeping the transaction
manageable but also to be able to restart a very large update in case it
failed part way through for system reasons (server power cut, user
reboots laptop by accident, ...) Imagine keeping a DBpedia copy up to date.
I think the thought is that a producer of A patch can decide whether each
transaction being recorded should be reversible or not. For example if you are
a very large dataset to an already large database you probably don’t want to
slow down the import process by having to check whether every triple/quad is
already in the database as you import it. Therefore you might choose to output
a non-reversible transaction for performance reasons.
On the other hand if you’re accepting a small change to the data then that cost
is probably acceptable and you would output a reversible transaction.
I am not arguing that you shouldn’t have transaction boundaries, in fact I
think they are essential, but simply that you may want to be to annotate the
properties of a transaction Beyond just stating the boundaries.