Re: RDF Diff/patch

Claude Warren Wed, 27 Dec 2017 15:44:28 -0800

Currently I am using https://github.com/Claudenw/java-diff-utils (forked
from https://github.com/dnaumenko/java-diff-utils -- no changes yet).

I start with the assumption that the datastore will always produce the same
ID for the blank node across queries.  I assume they will change if deleted
and reinserted but as long as there is no change I assume they are the same
id.  If that assumption does not hold the diff probably won't work
correctly.

I basically perform a query against the 2 datasets to producer ordered
g,s,p,o quads.

I feed the results into diff/patch routine.

Currently if the blank nodes have different ids they would be deleted and
reinserted in the first case and just one deleted in the second case.

The code is at https://github.com/Claudenw/rdf-diff-patch (sorry Andy got
"rdf" and "patch" in the name -- I'll change it if I can find another good
descriptor -- alternatively, we might be able to generate RDF-patch format
output).

Use PatchFactory to create the patch object and UpdateFactory to create the
UpdateRequest.

This code does need the recent fixes for jena-querybuilder 3.7.0-SNAPSHOT.

I have only been working on this for a couple of days and there are several
places to improve it.

   1. I think the diff/patch routine has some equality plugin points that
   might make matching different blank node ids within a graph possible in the
   diff processing.
   2. Since the patch generated by java-diff-utils would have both the
   delete and the insert quads it should be possible to create models for each
   named graph in the quad list, perform some queries against them to remove
   any blank nodes that are the "same" (your choice of definition for "same")
   and perform mapping between old and new node ids.

There are lots of edge cases to explore here.

Claude

On Wed, Dec 27, 2017 at 4:26 PM, ajs6f <[email protected]> wrote:

> I'm curious too, Claude. Is the idea that one assumes that bnodes are
> already using the same pool of labels, or something like that? IOW, if I
> have dataset1:
>
> _:a a my:type .
> _:b a my:type .
>
> and dataset2:
>
> _:c a my:type .
>
> and I want to convert dataset1 into dataset2, will your algorithm delete
> both triples and add a new one, or just remove a triple, and if so, is that
> deterministic? If dataset2 is instead:
>
> _:a a my:type .
>
> will the algorithm only remove one triple and be done, or remove both and
> add a new one?
>
> ajs6f
>
> > On Dec 27, 2017, at 11:00 AM, Andy Seaborne <[email protected]> wrote:
> >
> > It would be interesting to see especially the handling of blank nodes
> cycles and other structures.
> >
> > Please don't call it "RDF Patch" or a names similar to that - that term
> is already used.
> >
> >    Andy
> >
> > On 26/12/17 18:17, Claude Warren wrote:
> >> Howdy,
> >> I am working on a tool that can create UpdateRequests that will convert
> one
> >> Dataset into another.
> >> The basic idea is to extract the quads sorted by (g,s,p,o) and then
> perform
> >> a diff on the lists (like a text diff but each quad is a "line").
> >> The result is that I can create statements to delete insert and delete
> one
> >> dataset to make it "identical" to the other.  Identical in this case
> means
> >> that each model in the two datasets are isomorphic.
> >> Is anyone else interested in this?
> >> Claude
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: RDF Diff/patch

Reply via email to