The case I was trying to solve was reading a largish XML document and converting it to an RDF graph. After a few iterations I ended up writing a custom Sax parser that calls the RDFStream triple/quad methods. But I wanted a way to update a Fuseki server so RDFConnection seemed like the natural choice.
In some recent work for my employer I found that I like the RDFConneciton as the same code can work against a local dataset or a remote one. Claude On Mon, Jul 8, 2019 at 4:34 PM ajs6f <[email protected]> wrote: > This "replay" buffer approach was the direction I first went in for TIM, > until turning to MVCC (speaking of MVCC, that code is probably somewhere, > since we don't squash when we merge). Looking back, one thing that helped > me move on was the potential effect of very large transactions. But in a > controlled situation like Claude's, that problem wouldn't arise. > > ajs6f > > > On Jul 8, 2019, at 11:07 AM, Andy Seaborne <[email protected]> wrote: > > > > Claude, > > > > Good timing! > > > > This is what RDF Delta does and for updates rather than just StreamRDF > additions though its not to an RDFConnection - it's to a patch service. > > > > With hindsight, I wonder if that woudl have been better as > BufferingDatasetGraph - a DSG that keeps changes and makes the view of the > buffer and underlying DatasetGraph behave correctly (find* works and has > the right cardinality of results). Its a bit fiddley to get it all right > but once it works it is a building block that has a lot of re-usability. > > > > I came across this with the SHACL work for a BufferingGraph (with > prefixes) give "abort" of transactions to simple graphs which aren't > transactional. > > > > But it occurs in Fuseki with complex dataset set ups like rules. > > > > Andy > > > > On 08/07/2019 11:09, Claude Warren wrote: > >> I have written an RDFStream to RDFConnection with caching. Basically, > the > >> stream caches triples/quads until a limit is reached and then it writes > >> them to the RDFConnection. At finish it writes any triples/quads in the > >> cache to the RDFConnection. > >> Internally I cache the stream in a dataset. I write triples to the > default > >> dataset and quads as appropriate. > >> I have a couple of questions: > >> 1) In this arrangement what does the "base" tell me? I currently ignore > it > >> and want to make sure I havn't missed something. > > > > The parser saw a BASE statement. > > > > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are > concatenated). > > > > Its not necessary because the data stream should have resolved IRIs in > it so base is used in a stream. > > > >> 2) I capture all the prefix calls in a PrefixMapping that is accessible > >> from the RDFConnectionStream class. They are not passed into the > dataset > >> in any way. I didn't see any method to do so and don't really think it > is > >> needed. Does anyone see a problem with this? > >> 3) Does anyone have a use for this class? If so I am happy to > contribute > >> it, though the next question becomes what module to put it in? Perhaps > we > >> should have an extras package for RDFStream implementations? > >> Claude > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
