Re: RDFStream to RDFConnection

Claude Warren Tue, 09 Jul 2019 02:44:26 -0700

So, the question is should I go ahead and create a library of StreamRDF
implementations in the extras section?  I could see one to do serialization
over Kafka (or other queue implementations)?


On Mon, Jul 8, 2019 at 5:56 PM Claude Warren <[email protected]> wrote:

> The case I was trying to solve was reading a largish XML document and
> converting it to an RDF graph.  After a few iterations I ended up writing a
> custom Sax parser that calls the RDFStream triple/quad methods.  But I
> wanted a way to update a Fuseki server so RDFConnection seemed like the
> natural choice.
>
> In some recent work for my employer I found that I like the RDFConneciton
> as the same code can work against a local dataset or a remote one.
>
> Claude
>
> On Mon, Jul 8, 2019 at 4:34 PM ajs6f <[email protected]> wrote:
>
>> This "replay" buffer approach was the direction I first went in for TIM,
>> until turning to MVCC (speaking of MVCC, that code is probably somewhere,
>> since we don't squash when we merge). Looking back, one thing that helped
>> me move on was the potential effect of very large transactions. But in a
>> controlled situation like Claude's, that problem wouldn't arise.
>>
>> ajs6f
>>
>> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne <[email protected]> wrote:
>> >
>> > Claude,
>> >
>> > Good timing!
>> >
>> > This is what RDF Delta does and for updates rather than just StreamRDF
>> additions though its not to an RDFConnection - it's to a patch service.
>> >
>> > With hindsight, I wonder if that woudl have been better as
>> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
>> buffer and underlying DatasetGraph behave correctly (find* works and has
>> the right cardinality of results). Its a bit fiddley to get it all right
>> but once it works it is a building block that has a lot of re-usability.
>> >
>> > I came across this with the SHACL work for a BufferingGraph (with
>> prefixes) give "abort" of transactions to simple graphs which aren't
>> transactional.
>> >
>> > But it occurs in Fuseki with complex dataset set ups like rules.
>> >
>> >    Andy
>> >
>> > On 08/07/2019 11:09, Claude Warren wrote:
>> >> I have written an RDFStream to RDFConnection with caching.  Basically,
>> the
>> >> stream caches triples/quads until a limit is reached and then it writes
>> >> them to the RDFConnection.  At finish it writes any triples/quads in
>> the
>> >> cache to the RDFConnection.
>> >> Internally I cache the stream in a dataset.  I write triples to the
>> default
>> >> dataset and quads as appropriate.
>> >> I have a couple of questions:
>> >> 1) In this arrangement what does the "base" tell me? I currently
>> ignore it
>> >> and want to make sure I havn't missed something.
>> >
>> > The parser saw a BASE statement.
>> >
>> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
>> concatenated).
>> >
>> > Its not necessary because the data stream should have resolved IRIs in
>> it so base is used in a stream.
>> >
>> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>> >> from the RDFConnectionStream class.  They are not passed into the
>> dataset
>> >> in any way.  I didn't see any method to do so and don't really think
>> it is
>> >> needed.  Does anyone see a problem with this?
>> >> 3) Does anyone have a use for this class?  If so I am happy to
>> contribute
>> >> it, though the next question becomes what module to put it in?
>> Perhaps we
>> >> should have an extras package for RDFStream implementations?
>> >> Claude
>>
>>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: RDFStream to RDFConnection

Reply via email to