So, the question is should I go ahead and create a library of StreamRDF implementations in the extras section? I could see one to do serialization over Kafka (or other queue implementations)?
On Mon, Jul 8, 2019 at 5:56 PM Claude Warren <[email protected]> wrote: > The case I was trying to solve was reading a largish XML document and > converting it to an RDF graph. After a few iterations I ended up writing a > custom Sax parser that calls the RDFStream triple/quad methods. But I > wanted a way to update a Fuseki server so RDFConnection seemed like the > natural choice. > > In some recent work for my employer I found that I like the RDFConneciton > as the same code can work against a local dataset or a remote one. > > Claude > > On Mon, Jul 8, 2019 at 4:34 PM ajs6f <[email protected]> wrote: > >> This "replay" buffer approach was the direction I first went in for TIM, >> until turning to MVCC (speaking of MVCC, that code is probably somewhere, >> since we don't squash when we merge). Looking back, one thing that helped >> me move on was the potential effect of very large transactions. But in a >> controlled situation like Claude's, that problem wouldn't arise. >> >> ajs6f >> >> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne <[email protected]> wrote: >> > >> > Claude, >> > >> > Good timing! >> > >> > This is what RDF Delta does and for updates rather than just StreamRDF >> additions though its not to an RDFConnection - it's to a patch service. >> > >> > With hindsight, I wonder if that woudl have been better as >> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the >> buffer and underlying DatasetGraph behave correctly (find* works and has >> the right cardinality of results). Its a bit fiddley to get it all right >> but once it works it is a building block that has a lot of re-usability. >> > >> > I came across this with the SHACL work for a BufferingGraph (with >> prefixes) give "abort" of transactions to simple graphs which aren't >> transactional. >> > >> > But it occurs in Fuseki with complex dataset set ups like rules. >> > >> > Andy >> > >> > On 08/07/2019 11:09, Claude Warren wrote: >> >> I have written an RDFStream to RDFConnection with caching. Basically, >> the >> >> stream caches triples/quads until a limit is reached and then it writes >> >> them to the RDFConnection. At finish it writes any triples/quads in >> the >> >> cache to the RDFConnection. >> >> Internally I cache the stream in a dataset. I write triples to the >> default >> >> dataset and quads as appropriate. >> >> I have a couple of questions: >> >> 1) In this arrangement what does the "base" tell me? I currently >> ignore it >> >> and want to make sure I havn't missed something. >> > >> > The parser saw a BASE statement. >> > >> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are >> concatenated). >> > >> > Its not necessary because the data stream should have resolved IRIs in >> it so base is used in a stream. >> > >> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible >> >> from the RDFConnectionStream class. They are not passed into the >> dataset >> >> in any way. I didn't see any method to do so and don't really think >> it is >> >> needed. Does anyone see a problem with this? >> >> 3) Does anyone have a use for this class? If so I am happy to >> contribute >> >> it, though the next question becomes what module to put it in? >> Perhaps we >> >> should have an extras package for RDFStream implementations? >> >> Claude >> >> > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
