For my cases, I would like intra-patch transactions because I have several different possible implementations of "patch"-- in other words, a patch might be an HTTP request, a section of a journal on a filesystem, a feed from a queue between time x and time y, an isolated file, etc. Having an independent notion of transaction would let me easily keep a common entity (transaction) in my systems even though the concrete manifestation of "patch" is varying.
--- A. Soroka The University of Virginia Library > On Oct 20, 2016, at 8:50 AM, Andy Seaborne <[email protected]> wrote: > > > > On 19/10/16 10:51, Rob Vesse wrote: >> On 14/10/2016 17:09, "Andy Seaborne" <[email protected]> wrote: >> >> I don't understand what capabilities are enabled by transaction >> granularity if there are multiple transactions in a single patch. >> Concrete examples of where it helps? >> >> However, I've normally been working with one transaction per patch anyway. >> >> Allowing multiple transaction per patch is for making a collect of >> (semantically) related changes into a unit, by consolidating small >> patches "today's changes " (c.f. git squash). >> >> Leaving the transaction boundaries in gives internal checkpoints, not >> just one big transaction. It also makes the consolidate patch >> decomposable (unlike squash). >> >> Internal checkpoints are useful not just for keeping the transaction >> manageable but also to be able to restart a very large update in case it >> failed part way through for system reasons (server power cut, user >> reboots laptop by accident, ...) Imagine keeping a DBpedia copy up to >> date. >> >> I think the thought is that a producer of A patch can decide whether >> each transaction being recorded should be reversible or not. For >> example if you are a very large dataset to an already large database >> you probably don’t want to slow down the import process by having to >> check whether every triple/quad is already in the database as you >> import it. Therefore you might choose to output a non-reversible >> transaction for performance reasons. >> >> On the other hand if you’re accepting a small change to the data then >> that cost is probably acceptable and you would output a reversible >> transaction. >> >> I am not arguing that you shouldn’t have transaction boundaries, in >> fact I think they are essential, but simply that you may want to be >> to annotate the properties of a transaction Beyond just stating the >> boundaries. > > Rob, > > I agree the producer needs to have control. What I am asking is why one > patch unit (packet) would have multiple transactions with different > characteristics in it. The properties of patch packet include reversibility > of contents. A patch overall isn't reversible unless each transaction within > it is so there is now an opportunity for errors. > > I think unit of patch packet is enough - it is supposed to be a sensible set > of changes to move the dataset from one consistent state to another. In > developing that set of changes, there may have been several transactions > (c.f. git squash). It happens to give a checkpoint effect on large patches > as well. > > Analogy that may not help : a "TB/TC" is a database-transaction and a "patch" > is more like a "business transaction". > > > (The use of "transaction" may not be the best - "action"? but with a need for > "abort" as well as "commit", "transaction" > > Andy
