Re: RDF Patch - experiences suggesting changes

A. Soroka Thu, 20 Oct 2016 06:21:43 -0700

For my cases, I would like intra-patch transactions because I have several 
different possible implementations of "patch"-- in other words, a patch might 
be an HTTP request, a section of a journal on a filesystem, a feed from a queue 
between time x and time y, an isolated file, etc. Having an independent notion 
of transaction would let me easily keep a common entity (transaction) in my 
systems even though the concrete manifestation of "patch" is varying.


---
A. Soroka
The University of Virginia Library

> On Oct 20, 2016, at 8:50 AM, Andy Seaborne <[email protected]> wrote:
> 
> 
> 
> On 19/10/16 10:51, Rob Vesse wrote:
>> On 14/10/2016 17:09, "Andy Seaborne" <[email protected]> wrote:
>> 
>>    I don't understand what capabilities are enabled by transaction
>>    granularity if there are multiple transactions in a single patch.
>>    Concrete examples of where it helps?
>> 
>>    However, I've normally been working with one transaction per patch anyway.
>> 
>>    Allowing multiple transaction per patch is for making a collect of
>>    (semantically) related changes into a unit, by consolidating small
>>    patches "today's changes " (c.f. git squash).
>> 
>>    Leaving the transaction boundaries in gives internal checkpoints, not
>>    just one big transaction. It also makes the consolidate patch
>>    decomposable (unlike squash).
>> 
>>    Internal checkpoints are useful not just for keeping the transaction
>>    manageable but also to be able to restart a very large update in case it
>>    failed part way through for system reasons (server power cut, user
>>    reboots laptop by accident, ...)  Imagine keeping a DBpedia copy up to 
>> date.
>> 
>> I think the thought is that a producer of A patch can decide whether
>> each transaction being recorded should be reversible or not. For
>> example if you are a very large dataset to an already large database
>> you probably don’t want to slow down the import process by having to
>> check whether every triple/quad is already in the database as you
>> import it. Therefore you might choose to output a non-reversible
>> transaction for performance reasons.
>> 
>> On the other hand if you’re accepting a small change to the data then
>> that cost is probably acceptable and you would output a reversible
>> transaction.
>> 
>> I am not arguing that you shouldn’t have transaction boundaries, in
>> fact I think they are essential, but simply that you may want to be
>> to annotate the properties of a transaction Beyond just stating the
>> boundaries.
> 
> Rob,
> 
> I agree the producer needs to have control.  What I am asking is why one 
> patch unit (packet) would have multiple transactions with different 
> characteristics in it.  The properties of patch packet include reversibility 
> of contents. A patch overall isn't reversible unless each transaction within 
> it is so there is now an opportunity for errors.
> 
> I think unit of patch packet is enough - it is supposed to be a sensible set 
> of changes to move the dataset from one consistent state to another.  In 
> developing that set of changes, there may have been several transactions 
> (c.f. git squash).  It happens to give a checkpoint effect on large patches 
> as well.
> 
> Analogy that may not help : a "TB/TC" is a database-transaction and a "patch" 
> is more like a "business transaction".
> 
> 
> (The use of "transaction" may not be the best - "action"? but with a need for 
> "abort" as well as "commit", "transaction"
> 
>       Andy

Re: RDF Patch - experiences suggesting changes

Reply via email to