Currently I am using https://github.com/Claudenw/java-diff-utils (forked from https://github.com/dnaumenko/java-diff-utils -- no changes yet).
I start with the assumption that the datastore will always produce the same ID for the blank node across queries. I assume they will change if deleted and reinserted but as long as there is no change I assume they are the same id. If that assumption does not hold the diff probably won't work correctly. I basically perform a query against the 2 datasets to producer ordered g,s,p,o quads. I feed the results into diff/patch routine. Currently if the blank nodes have different ids they would be deleted and reinserted in the first case and just one deleted in the second case. The code is at https://github.com/Claudenw/rdf-diff-patch (sorry Andy got "rdf" and "patch" in the name -- I'll change it if I can find another good descriptor -- alternatively, we might be able to generate RDF-patch format output). Use PatchFactory to create the patch object and UpdateFactory to create the UpdateRequest. This code does need the recent fixes for jena-querybuilder 3.7.0-SNAPSHOT. I have only been working on this for a couple of days and there are several places to improve it. 1. I think the diff/patch routine has some equality plugin points that might make matching different blank node ids within a graph possible in the diff processing. 2. Since the patch generated by java-diff-utils would have both the delete and the insert quads it should be possible to create models for each named graph in the quad list, perform some queries against them to remove any blank nodes that are the "same" (your choice of definition for "same") and perform mapping between old and new node ids. There are lots of edge cases to explore here. Claude On Wed, Dec 27, 2017 at 4:26 PM, ajs6f <[email protected]> wrote: > I'm curious too, Claude. Is the idea that one assumes that bnodes are > already using the same pool of labels, or something like that? IOW, if I > have dataset1: > > _:a a my:type . > _:b a my:type . > > and dataset2: > > _:c a my:type . > > and I want to convert dataset1 into dataset2, will your algorithm delete > both triples and add a new one, or just remove a triple, and if so, is that > deterministic? If dataset2 is instead: > > _:a a my:type . > > will the algorithm only remove one triple and be done, or remove both and > add a new one? > > ajs6f > > > On Dec 27, 2017, at 11:00 AM, Andy Seaborne <[email protected]> wrote: > > > > It would be interesting to see especially the handling of blank nodes > cycles and other structures. > > > > Please don't call it "RDF Patch" or a names similar to that - that term > is already used. > > > > Andy > > > > On 26/12/17 18:17, Claude Warren wrote: > >> Howdy, > >> I am working on a tool that can create UpdateRequests that will convert > one > >> Dataset into another. > >> The basic idea is to extract the quads sorted by (g,s,p,o) and then > perform > >> a diff on the lists (like a text diff but each quad is a "line"). > >> The result is that I can create statements to delete insert and delete > one > >> dataset to make it "identical" to the other. Identical in this case > means > >> that each model in the two datasets are isomorphic. > >> Is anyone else interested in this? > >> Claude > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
