#section-dataset and issue 35

Sandro Hawke Mon, 15 Jul 2013 17:39:43 -0700

off-list but public response, so we can continue informally for a bit.(this is what I should have done the first time, given the currentrather strict rules about public-rdf-comments, because it had gotten outof control.)


On 07/15/2013 11:14 AM, Jeremy J Carroll wrote:

Hi Sandro


to reply, in turn informally

I think you have largely understood my position, although - without introducing 
time, I have difficulty seeing much difference between your two concepts: 
rdf:WebviewDataset   and  rdf:DirectDataset.

Equality is another perfectly good way to distinguish them (we don'tneed time).


It follows from this dataset:

 <> a rdf:DirectDataset.
 GRAPH _:a { <s> <p> <o> }
 GRAPH _:b { <s> <p> <o> }

that _:a = _:b. Each blank node denotes the same g-snap, thereforethey must be equal.


In contrast, this is totally different:

  <> a rdf:WebviewDataset.
  GRAPH <a> { <s> <p> <o> }
  GRAPH <b> { <s> <p> <o> }

Here we've said that <a> and <b> are each Web Resources (g-boxes) thathappen to have the same triples. There's no reason to think they are equal.

Since RDF is known not to do time very well, it doesn't surprise me that there 
may be difficulties in thinking about changes in the graph content in a 
dataset. I am really not expecting RDF semantics to address that.
I am expecting RDF to allow me to describe resources - specifically resources 
introduced by RDF; I am not expecting RDF to provide me the ability to make 
paradoxical statements about resources (which an overly excessive version of 
the rdf:DirectDataset view risks, see:
http://lists.w3.org/Archives/Public/www-rdf-logic/2004Apr/0029
)
I want to say simple stuff like who wrote a graph named in the dataset. The 
easiest way to do this is to attach the metadata to the name.


So, given this dataset:

   <> a rdf:DirectDataset.
<> a rdf:WebviewDataset.
   GRAPH _:a { <s> <p> <o> }
   GRAPH _:b { <s> <p> <o> }
   GRAPH <a> { <s> <p> <o> }
   GRAPH <b> { <s> <p> <o> }
   _:a dc:creator "Alice".
   <a> dc:creator "Bob".

it follows that

   _:b dc:creator "Alice".

it does NOT follow that:

   <b> dc:creator "Bob".

Kind of important difference, right? And you can see why people wouldwant both kinds of semantics.

(and they tell me the also want other things, like entailments withinthe named graphs, although I'm not convinced that's worth it.)

  This currently is not supported by RDF and I would like to have a clear technical 
explanation as to why, rather than a political rationale (which is of course totally 
understandable) "we didn't pick a semantics for datasets, because there are so many 
different ones out there already, so nothing we could pick wouldn't cause someone 
problems."

Is the above convincing? I agree there should be some kind ofrationale in the documents, if possible.


     -- Sandro

Jeremy J Carroll
Principal Architect
Syapse, Inc.



On Jul 11, 2013, at 5:59 PM, Sandro Hawke <san...@w3.org> wrote:

On 07/11/2013 03:06 PM, Jeremy J Carroll wrote:

Hello

This is a formal comment on 
http://www.w3.org/TR/rdf11-concepts/#section-dataset, and it appears a comment 
on
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-schema/index.html
and quite possibly on the RDF Semantics ….

This is a brief, informal reply to both the message I'm replying to [1] and 
your following message [2].

The short answer is: we didn't pick a semantics for datasets, because there are 
so many different ones out there already, so nothing we could pick wouldn't 
cause someone problems.   So we say that datasets, on their own, have a minimal 
semantics plus application-specific semantic extensions.   If you want 
interoperability between application, you need to indicate your semantic 
extensions.  You can do that out-of-band (in some way you figure out) or in 
band, by putting some metadata in the dataset saying which semantic extensions 
you're using.

We are hoping to produce a NOTE which provides some options, so people don't 
have to start from scratch with these indicators.   We don't think the subject 
is mature enough yet to put designs in a Recommendation, though.

My current thinking, which the group hasn't really talked about, is:

   <> a rdf:WebviewDataset   (Or ResourceStateDataset or GraphStoreSnapshot)

would provide the semantics I think you want, where a URL graph name is 
associated with the graph you'd get if you dereferenced that URL.   You might 
think of the URLs as denoting the Web Resource whose state is represented by 
the associated graph.   My sense from your examples is that's how you're 
thinking about datasets.

  <> a rdf:DirectDataset

would provide the semantics some other folks want, where the graph names 
actually denote the associated graphs (the pure mathematical set of triples, 
not a thing which can change over time).    This is what people are used to 
from N3 and (I think) from most provenance work.

I'm inclined to say DirectDataset only constrains name/graph pairs where the 
graph names are blank nodes and WebviewDataset only constrains name/graph pairs 
where the graph names are http(s) IRIs.   This would allow these two semantic 
extensions to be used together.   If you said:

  <> a rdf:WebviewDataset, rdf:DirectDataset.
  GRAPH _:a { <s> <p> 1 }
  GRAPH <b> { <s> <p> 2}
  _:a eg:endorsedBy eg:sandro.
  <b> eg:endorsedBy eg:sandro.

Then you'd be saying I endorsed the statement {<s> <p> 1 } and I endorsed the (mutable) Web Resource 
<b>, whose contents happen to be { <s> <p> 2 }.     (On that latter bit, hopefully there will be 
some other metadata to help clarify *when* those are the contents of <b>, but we haven't figured out yet how 
to do that.)

Does that make any sense?   Does this change your comments?     I have to 
apologize for not having the NOTE drafted yet, and thus adding to the confusion.

      -- Sandro


[1] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0021.html
[2] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0022.html

It seems to be a suggestion to reopen issue 35
http://www.w3.org/2011/rdf-wg/track/issues/35
which points to
http://www.w3.org/TR/sparql11-service-description/
hence I am CC-ing dawg.
The last part of this message discusses problems in using service description 
to meet my use case: to me, this is not a comment on DAWG's work, but a comment 
that RDF Core cannot use DAWG's work of more limited scope to duck the issue.


Summary: I would like to use rdf to describe graphs in a dataset, e.g. to say 
who the author was.

as a simple example

my:graph {
    my:graph dc:creator "Jeremy J. Carroll" .
}

I cannot see how to do this with the current drafts, editors drafts, etc.

A possible approach would be to reopen issue 35  and have a class rdfs:Graph, s.t. 
for a <URI> used as the name of a graph in a dataset the triple
    <URI> rdf:type rdfs:Graph
holds.
More weakly, I would be satisfied with such a concept being added to the RDF 
vocabulary, without the implication above holding, but a suggested usage 
pattern.

Also, I basically finished this message before finding issue 35 and it's 
superficially reasonable resolution that sd:Graph may meet my needs. This 
suggests that some documentation link from either RDF Concepts or RDF Schema or 
RDF Semantics to SPARQL Service Description would be helpful ….
However, the Service Description doc
http://www.w3.org/TR/sparql11-service-description/
ducks on the issue of whether the name denotes the graph, and so does not give 
me a clear place to put such metadata.
I think if the RDF WG tried writing such documentation, they would discover 
that the resolution of issue 35 would unravel - the trick is to allow such 
unravelling without having too much of the named graphs work unravel.

----


Here is my actual use case …..





I first give my motivation, then I give my weak suggestion.

Motivation:
=========

I referred to RDF Concepts 1.1 today because I am constructing an RDF dataset 
and wished to add metadata concerning the named graphs.
I am trying to articulate a multi tenant architecture over a SPARQL end point, 
in which each user is assigned to a specific organization, and then depending 
on this organization, they have access to different named graphs.

I wish to refer to the named graphs using the URI names I have assigned to 
them, and I wish to create my own property to add this metadata


Concretely, my property might be
        syapse:owningOrganization

and the quads I was thinking of producing include

GRAPH <https://test.syapse.com/graph/syapse> {
     <https://test.syapse.com/graph/syapse> syapse:owningOrganization syapse: .
      syapse:owningOrganization rdf:type owl:FunctionalProperty .
      syapse:owningOrganization rdfs:range syapse:Organization .
      syapse:   rdf:type syapse:Organization .
      syapse:Organization rdf:type rdfs:Class .
     …
     …
}

GRAPH <https://test.syapse.com/graph/ontology/base> {
     <https://test.syapse.com/graph/ontology/base> syapse:owningOrganization 
syapse: .
     …
     …
}

GRAPH <https://test.syapse.com/graph/ontology/sys> {
     <https://test.syapse.com/graph/ontology/sys> syapse:owningOrganization 
syapse: .
     …
     …
}

GRAPH <https://test.syapse.com/graph/ontology/c2> {
     <https://test.syapse.com/graph/ontology/c2> syapse:owningOrganization 
<https://test.syapse.com/graph/southParkUniversity> .
     …
     …
}

GRAPH <https://test.syapse.com/graph/southParkUniversity/abox> {
     <https://test.syapse.com/graph/southParkUniversity/abox> 
syapse:owningOrganization <https://test.syapse.com/graph/southParkUniversity> .
     <https://test.syapse.com/graph/southParkUniversity> rdf:type 
syapse:Organization .
     …
     …
}


This allows me to run a privileged SPARQL query across the whole dataset to 
find out which graphs are assigned to which organization, and then knowing 
which organization a user is in, I can have application logic to determine 
which named graphs they can access, and restrict their queries to those named 
graphs.


Weak suggestion
==============

I read the very limited text in the dataset section, and the note as reflecting 
a victory for those who do not want the implication that the name of the graph 
is a graph to hold.
As a long standing advocate of the other position in which, of course, it 
denotes … I am somewhat disappointed.

However, adding such a vocab item can allow the users to decide on a 
case-by-case basis whether such denotation is intended or not.

e.g.

    rdfs:Graph
      rdfs:Graph is the class of RDF Graphs as defined by RDF Concepts.

Semantics:


    <g> { …. }

    does not imply
          g rdf:type rdfs:Graph


but

     <g> { …. } .
     <g>  rdf:type rdfs:Graph

does imply that the interpretation of <g> is given by the graph.


Problems with the Service Description approach
=====================================

Reading
http://www.w3.org/TR/sparql11-service-description/
my understanding is that the intent is for the endpoint to provide (closed) 
metadata about the dataset, which does not enable further comment even from 
someone with update privileges on the dataset.

e.g. in



@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix ent: <http://www.w3.org/ns/entailment/> .
@prefix prof: <http://www.w3.org/ns/owl-profile/> .
@prefix void: <http://rdfs.org/ns/void#> .

[] a sd:Service ;
     sd:endpoint <http://www.example/sparql/> ;
     sd:supportedLanguage sd:SPARQL11Query ;
     sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML>, 
<http://www.w3.org/ns/formats/Turtle> ;
     sd:extensionFunction <http://example.org/Distance> ;
     sd:feature sd:DereferencesURIs ;
     sd:defaultEntailmentRegime ent:RDFS ;
     sd:defaultDataset [
         a sd:Dataset ;
         sd:defaultGraph [
             a sd:Graph ;
             void:triples 100
         ] ;
         sd:namedGraph [
             a sd:NamedGraph ;
             sd:name <http://www.example/named-graph> ;
             sd:entailmentRegime ent:OWL-RDF-Based ;
             sd:supportedEntailmentProfile prof:RL ;
             sd:graph [
                 a sd:Graph ;
                 void:triples 2000
             ]
         ]
     ] .

<http://example.org/Distance> a sd:Function .


The description of the named graph is attached to an explicitly blank node, that I 
then cannot make further comment in in my own graph or indeed inside the graph named 
<http://www.example/named-graph> itself.
Thus I cannot add a dc:creator (or syapse:owningOrganization) triple inside this 
service description (because SPARQL 1.1 does not give me, nor does it intend to give 
me) write access to the service description, even if I have write access to 
<http://www.example/named-graph>

These issues perhaps could be addressed by making sd:graph and sd:name  both 
1-1 properties …. but I imagine there may be some reluctance ….

NB - this last comment, is not a formal comment on the Service Description 
Spec, which seems fit-for-purpose, it is a comment on the current resolution of 
Issue-35 which neglects that the purpose of SPARQL Service Description is less 
than is needed to address the issue






Jeremy J Carroll
Principal Architect
Syapse, Inc.

Re: rdfs:Graph ? comment on http://www.w3.org/TR/rdf11-concepts/#section-dataset and issue 35

Reply via email to