Re: dataset assembler for JENA-624

A. Soroka Thu, 05 Nov 2015 09:09:47 -0800

Okay, as per comments on the PR itself, I have removed all use of 
ja:defaultGraph and ja:graph and simplified to just plain ja:data.


There is no more functionality to create a complex model (e.g. an inferring 
model) and copy it into the in-memory dataset, which both simplifies the code 
and will simplify the documentation. There are only the basic use cases as Andy 
outlines them below.

---
A. Soroka
The University of Virginia Library

> On Nov 5, 2015, at 11:49 AM, Andy Seaborne <[email protected]> wrote:
> 
> On 05/11/15 16:11, A. Soroka wrote:
>> On Nov 5, 2015, at 11:02 AM, Andy Seaborne <[email protected]> wrote:
>>> On 05/11/15 15:44, A. Soroka wrote:
>>>> Yeah, I basically copied the “parallel semantics” of ja:graph and 
>>>> ja:defaultGraph from DatasetAssembler. Perhaps I misunderstood them?
>>> If you think it's a problem, do you have a better name?
>> 
>> I’m now a bit confused about what they mean and what ja:data is supposed to 
>> mean, and I don’t want to spread my confusion any further! {grin} To (try 
>> to) get some clarity: are they supposed to have the same meaning (they do 
>> now in my PR, and I really think they do in DatasetAssembler)? If they are 
>> supposed to have a different meaning, and one of those meanings is “load 
>> from this URI”, what is the difference in that and the newly-introduced 
>> ja:data predicate?
>> 
> 
> They are different.
> 
> If you call assembler.open for the object of ja:defaultGraph it will create a 
> model. It does not know about datasets at that point.  You get a regular 
> model in-memory.
> 
> Adding to the in-mem dataset with addNamedModel or setDefaultModel will be a 
> copy into the datastructures of the in-mem dataset.
> 
> The use cases for direct file loading are:
> 
> 
> 1/ Load file into dataset.
> RDFDataMgr.read(dataset, file)
>  Case 1a: quads
>  Case 1b: triples
> 
> 
> 2/ Load file to graph
>  Case 2a: default graph
>       RDFDataMgr.read(dataset.getdefaultModel(), file)
>  Case 2b: named graph
>       RDFDataMgr.read(dataset.getNamedModel(), file)
> 
> Note: When asked to read triples and it's a quads file, the RDFDataMgr 
> outputs just the triples from the default graph of the input (it does not 
> know it's reading into a named graph - it's just a destination of a stream of 
> triples).
> 
>       Andy
> 
> 
> 
> 
>>>> 
>>>> I will definitely gin up some file-based examples. Is there a standard way 
>>>> to do that in Jena tests? I.e. should I put documents in the test 
>>>> classpath, or something else?
>>> 
>>> See all the testing/ directories.
>> 
>> Thanks! I’ll get myself up to speed on this style of testing.
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> 
>>> They could go in src/test/resources but as they are loaded a plain files 
>>> from testing/ it's more realistic.
>>> 
>>> For lots of files and manifests its easier.
>>> 
>>> And the base URI issues of class loaded resources are horrible.
>>> 
>>>     Andy
>>> 
>>>> 
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>> 
>>>>> On Nov 5, 2015, at 10:35 AM, Andy Seaborne <[email protected]> wrote:
>>>>> 
>>>>> Hi there,
>>>>> 
>>>>> I got ja:data working but ja:graph alluded me.  After some code digging, 
>>>>> it seems it is a synonym for ja:defaultGraph but ja:defaultGraph goes to 
>>>>> a full-blown model description.
>>>>> 
>>>>> Tests from files would be good and they can be used in documentation 
>>>>> examples as well.
>>>>> 
>>>>> -----------------------
>>>>> @prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>> 
>>>>> <test:simpleExample>  a  ja:MemoryDataset ;
>>>>>        ja:data          <file:data.trig> ;
>>>>>        ja:data          <file:data1.ttl> ;
>>>>>        ja:graph        [ ja:data <file:data2.ttl> ] ;
>>>>>        ja:graph
>>>>>            [ ja:graphName  <http://example/g3> ;
>>>>>              ja:data <file:data3.ttl> ] ;
>>>>>        .
>>>>> -----------------------
>>>>> 
>>>>> and I tried:
>>>>> 
>>>>>   DatasetFactory.assemble("assembler.ttl") ;
>>>>> 
>>>>> OK - wiring not in place yet
>>>>> then
>>>>> 
>>>>>    Model m = RDFDataMgr.loadModel("assembler.ttl") ;
>>>>>    Resource r = m.createResource("test:simpleExample") ;
>>>>>    Dataset ds = (Dataset)new InMemDatasetAssembler().open(r) ;
>>>>> 
>>>>> I was expecting ja:graph to be a direct description - not needing a 
>>>>> ja:MemoryModel description
>>>>> 
>>>>> ja:graph is needed to associate a name with a triples file.
>>>>> 
>>>>> I took a go at an implementation:
>>>>> ---------------------
>>>>> 
>>>>> open( ...) {
>>>>>   ...
>>>>>   // Instead of: final Resource defaultGraphDef =
>>>>> 
>>>>>   multiValueResource(root, pGraph)
>>>>>         .forEach(r->readGraphDesc(dataset, r));
>>>>>   ...
>>>>> 
>>>>> 
>>>>> 
>>>>> }
>>>>> 
>>>>> // May need better checking for expected Resource/String things.
>>>>> private void readGraphDesc(Dataset dataset, Resource r) {
>>>>>    String gn = null ;
>>>>>    if ( r.hasProperty(pGraphName)) {
>>>>>        Resource rgn = r.getProperty(pGraphName).getResource() ;
>>>>>        gn = rgn.getURI() ;
>>>>>     }
>>>>>     String dataFn = getAsStringValue(r, data) ;
>>>>>     if ( gn == null )
>>>>>         RDFDataMgr.read(dataset.getDefaultModel(), dataFn);
>>>>>     else
>>>>>         RDFDataMgr.read(dataset.getNamedModel(gn), dataFn);
>>>>>    }
>>>>> ---------------------
>>>>> 
>>>>>   Andy
>>>>> 
>>>>> On 02/11/15 16:44, A. Soroka wrote:
>>>>>> I’ve added “direct data links” in the style Andy outlines below to this 
>>>>>> JENA-624 PR, with tests.
>>>>>> 
>>>>>> Feedback eagerly desired!
>>>>>> 
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>> 
>>>>>>> On Oct 30, 2015, at 3:05 PM, Andy Seaborne <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Does this seem like a reasonable approach? It should allow users to
>>>>>>>> build up their data by whatever means they like, including using
>>>>>>>> inferencing models to generate assertions, then add them to the
>>>>>>>> in-memory container.
>>>>>>> 
>>>>>>> That's an interesting possibility that hadn't occurred to me.  It's a 
>>>>>>> copy-in and the implications of that will need to be clear.
>>>>>>> 
>>>>>>> Just relying on ja:MemoryModel and ja:externalContent for the cases of 
>>>>>>> loading files risks a load-copy though.  It could be "optimzied"
>>>>>>> 
>>>>>>> What about allowing files to be directly pointed to from the assembler 
>>>>>>> for the ja:MemoryDataset?
>>>>>>> 
>>>>>>> Example:
>>>>>>> 
>>>>>>> -----------------------------------
>>>>>>> @prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>>>> 
>>>>>>> <test:simpleExample>  a  ja:MemoryDataset ;
>>>>>>>        ja:data          <file:///some/data.trig> ;
>>>>>>>        ja:data          <file:///some/data.ttl> ;
>>>>>>>        ja:graph         [ ja:data <file:///some/data2.ttl> ] ;
>>>>>>>        ja:graph
>>>>>>>             [ ja:graphName  <test:namedGraphExample> ;
>>>>>>>               ja:data <file:///some/data3.ttl> ] .
>>>>>>> -----------------------------------
>>>>>>> 
>>>>>>> The ability to load trig, NQuads in the dataset:
>>>>>>> 
>>>>>>>        ja:data          <file:///some/data.trig> ;
>>>>>>> 
>>>>>>> If a triples form is given, it loads the default graph. This is what 
>>>>>>> Jena does already.  These are both RDFDataMgr.read(dataset, file) ;
>>>>>>> 
>>>>>>> The ja:graph for specific graphs:
>>>>>>> Default graph (uniformity):
>>>>>>> 
>>>>>>>        ja:graph         [ ja:data <file:///some/data2.ttl> ] ;
>>>>>>> 
>>>>>>> and named graphs
>>>>>>>        ja:graph
>>>>>>>                [ ja:graphName  <test:namedGraphExample> ;
>>>>>>>                  ja:data <file:///some/data3.ttl> ] .
>>>>>>> 
>>>>>>> I like the build-copy idea - these also add the core tasks of loading a 
>>>>>>> file into a  dataset in a direct way.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>>         Andy
>>>>>>> 
>>>>>>>> 
>>>>>>>> --- A. Soroka The University of Virginia Library
>>>>>>> 
>>>>>>> Prefixes !!!!!!!!!!!!!1
>>>>>>> 
>>>>>>> @prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>>>> 
>>>>>>> <test:simpleExample>  a  ja:MemoryDataset ;
>>>>>>>        ja:defaultGraph  <test:defaultGraphDef> ;
>>>>>>>        ja:namedGraph    <test:namedGraphDef> .
>>>>>>> 
>>>>>>> <test:defaultGraphDef>
>>>>>>>        a           ja:MemoryModel ;
>>>>>>>        ja:content  [ ja:externalContent  <file:///some/triples.nt> ] .
>>>>>>> 
>>>>>>> <test:namedGraphDef>  a  ja:MemoryModel ;
>>>>>>>        ja:content
>>>>>>>             [ ja:externalContent  <file:///some/other/triples.nt> ] ;
>>>>>>>        ja:graphName  <test:namedGraphExample> .
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: dataset assembler for JENA-624

Reply via email to