On 14/10/2016 17:09, "Andy Seaborne" <a...@apache.org> wrote:
I don't understand what capabilities are enabled by transaction granularity if there are multiple transactions in a single patch. Concrete examples of where it helps? However, I've normally been working with one transaction per patch anyway. Allowing multiple transaction per patch is for making a collect of (semantically) related changes into a unit, by consolidating small patches "today's changes " (c.f. git squash). Leaving the transaction boundaries in gives internal checkpoints, not just one big transaction. It also makes the consolidate patch decomposable (unlike squash). Internal checkpoints are useful not just for keeping the transaction manageable but also to be able to restart a very large update in case it failed part way through for system reasons (server power cut, user reboots laptop by accident, ...) Imagine keeping a DBpedia copy up to date. I think the thought is that a producer of A patch can decide whether each transaction being recorded should be reversible or not. For example if you are a very large dataset to an already large database you probably don’t want to slow down the import process by having to check whether every triple/quad is already in the database as you import it. Therefore you might choose to output a non-reversible transaction for performance reasons. On the other hand if you’re accepting a small change to the data then that cost is probably acceptable and you would output a reversible transaction. I am not arguing that you shouldn’t have transaction boundaries, in fact I think they are essential, but simply that you may want to be to annotate the properties of a transaction Beyond just stating the boundaries.