Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread David Booth

On 06/08/2016 11:46 AM, james anderson wrote:

if the goal is to leave room for the judgement call,
assuming that dimension is free, to place each in its own graph gives
one the latitude to make the judgement call, develop some systematic
which depends on provenance, reflect the question to a manual choice, or
just project FROM both and allow a naive merge.


Absolutely agree.



which would see to make it the general best practice.
is there some case which argues against it?


Not AFAIK.  I think named graphs are one of the best tools that we have 
available, and should be considered a best practice.   I just meant to 
caution that named graphs may not *fully* solve the problem.  But they 
are the best starting point, IMO.


David Booth



Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Paul Houle
You've got it!

What matters is what your system believes is owl:sameAs based on its
viewpoint,  which could be based on who you trust to say owl:sameAs.  If
you are worried about "inference crashes" pruning this data is the place to
start.

You might want to apply algorithm X to a graph,  but data Y fails to have
property Z necessary for X to succeed.  It is a general problem if you are
sending a product downstream.

A processing module can massage a dataset so that the output graph Y always
has property Z or it fails and calls bloody murder if Z is not set,  etc.
It can emit warning messages that you could use to sweep for bad spots,
 etc.


On Wed, Jun 8, 2016 at 1:50 PM, Rob Davidson 
wrote:

> I'm not sure if I'm following exactly, so bear with me...
>
> If we have the same entity served up by two different sources then we
> might expect in an ideal world that there would be an OWL:sameAs or
> SKOS:exactMatch linking the two.
>
> If we have the same entity served by the same provider but via two
> different endpoints then we might expect something a bit like a
> DCAT:distribution link relating the two.
>
> Of course we might not have these specific links but I'm just trying to
> define the likely scenarios/use-cases.
>
> In either case, it's possible that the descriptions would be out of date
> and/or contradictory - this might cause inference crashes or simply be
> confusing if we tried to merge them too closely.
>
> Prioritising description fields based on the distribution method seems a
> little naive in that I might run either endpoint for a while, realise my
> users prefer the alternative and thus change technology in a direction
> unique to my users - not in a predictable fashion.
>
> So the only way I can see around this is to pool the descriptions but have
> them distinguished using the other metadata that indicates they come from
> different endpoints/sources/authors - keeping the descriptions on different
> graphs I suppose.
>
>
>
>
> On 8 June 2016 at 14:52, Paul Houle  wrote:
>
>> The vanilla RDF answer is that the data gathering module ought to pack
>> all of the graphs it got into named graphs that are part of a data set and
>> then pass that towards the consumer.
>>
>> You can union the named graphs for a primitive but effective kind of
>> "merge" or put in some module downstream that composites the graphs in some
>> arbitrary manner,  such as something that converts statements about people
>> to foaf: vocabulary to produce enough graph that would be piped downstream
>> to a foaf: consumer for instance.
>>
>> The named graphs give you sufficient anchor points to fill up another
>> dataset with metadata about what happened in the processing process so you
>> can follow "who is responsible for fact X?" past the initial data
>> transformations.
>>
>> On Wed, Jun 8, 2016 at 8:29 AM, Gray, Alasdair J G 
>> wrote:
>>
>>> Hi
>>>
>>> Option 3 seems sensible, particularly if you keep them in separate
>>> graphs.
>>>
>>> However shouldn’t you consider the provenance of the sources and
>>> prioritise them on how recent they were updated?
>>>
>>> Alasdair
>>>
>>> On 8 Jun 2016, at 13:06, Martynas Jusevičius 
>>> wrote:
>>>
>>> Hey all,
>>>
>>> we are developing software that consumes data both from Linked Data
>>> and SPARQL endpoints.
>>>
>>> Most of the time, these technologies complement each other. We've come
>>> across an issue though, which occurs in situations where RDF
>>> description of the same resources is available using both of them.
>>>
>>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
>>> as an example. Its RDF description is available in at least 2
>>> locations:
>>> - on a SPARQL endpoint:
>>>
>>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
>>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>>>
>>> These descriptions could be identical (I haven't checked), but it is
>>> more likely than not that they're out of sync, complementary, or
>>> possibly even contradicting each other, if reasoning is considered.
>>>
>>> If a software agent has access to both the SPARQL endpoint and Linked
>>> Data resource, what should it consider as the resource description?
>>> There are at least 3 options:
>>> 1. prioritize SPARQL description over Linked Data
>>> 2. prioritize Linked Data description over SPARQL
>>> 3. merge both descriptions
>>>
>>> I am leaning towards #3 as the sensible solution. But then I think the
>>> end-user should be informed which part of the description came from
>>> which source. This would be problematic if the descriptions are
>>> triples only, but should be doable with quads. That leads to another
>>> problem however, that both LD and SPARQL responses are under-specified
>>> in terms of quads.
>>>
>>> What do you think? Maybe this is a well-known issue, in which case

Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread james anderson
good afternoon;

> On 2016-06-08, at 17:17, David Booth  wrote:
> 
> On 06/08/2016 08:55 AM, Martynas Jusevičius wrote:
>> So I think it would be
>> wrong to ignore the "older" description -- or any "other" description
>> in general.
> 
> This gets into the whole area of what data you choose to believe.  Some data 
> is just plain wrong, and lots of data is "correct" (i.e. usable) for some 
> uses but wrong for others and will cause inconsistency when merged.  Very 
> little data is universally "correct".
> 
> I think it is inescapable that when merging data from multiple sources you 
> need to be careful about which data you choose to include.  Putting data from 
> each source in its own named graph is one good way to help keep track of 
> where it came from, and this is useful in deciding whether to include it.  
> But that provides only coarse-grained control.  You may well need to 
> eliminate only a few triples from some source data in order to make it merge 
> without causing inconsistencies, and it can be tedious to figure out which 
> triples to drop.

all true, but if the goal is to leave room for the judgement call, assuming 
that dimension is free, to place each in its own graph gives one the latitude 
to make the judgement call, develop some systematic which depends on 
provenance, reflect the question to a manual choice, or just project FROM both 
and allow a naive merge.

which would see to make it the general best practice.
is there some case which argues against it?

> 
> Bottom line: I don't think there is any simple answer to the question of 
> which data to include.  It requires a judgement call.

best regards, from berlin,
---
james anderson | ja...@dydra.com | http://dydra.com







Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Axel Ngonga

Hi Martynas,

Hybrid solutions do exist that can do 3. (see, e.g., [1]). However, 1. 
is definitely the most scalable approach (see, e.g., [2,3]). I'd suggest 
running 1. and 3. in parallel to ensure maximal user satisfaction.


Best,
Axel

[1] http://aderis.linkedopendata.net
[2] http://aksw.org/Projects/QUETSAL.html
[3] http://svn.aksw.org/papers/2016/Thesis_Saleem/public.pdf

On 08/06/16 14:06, Martynas Jusevičius wrote:

Hey all,

we are developing software that consumes data both from Linked Data
and SPARQL endpoints.

Most of the time, these technologies complement each other. We've come
across an issue though, which occurs in situations where RDF
description of the same resources is available using both of them.

Lest take a resource http://data.semanticweb.org/person/andy-seaborne
as an example. Its RDF description is available in at least 2
locations:
- on a SPARQL endpoint:
http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
- as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf

These descriptions could be identical (I haven't checked), but it is
more likely than not that they're out of sync, complementary, or
possibly even contradicting each other, if reasoning is considered.

If a software agent has access to both the SPARQL endpoint and Linked
Data resource, what should it consider as the resource description?
There are at least 3 options:
1. prioritize SPARQL description over Linked Data
2. prioritize Linked Data description over SPARQL
3. merge both descriptions

I am leaning towards #3 as the sensible solution. But then I think the
end-user should be informed which part of the description came from
which source. This would be problematic if the descriptions are
triples only, but should be doable with quads. That leads to another
problem however, that both LD and SPARQL responses are under-specified
in terms of quads.

What do you think? Maybe this is a well-known issue, in which case
please enlighten me with some articles :)


Martynas
atomgraph.com
@atomgraphhq






Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Daniel Herzig • SearchHaus GmbH

Hi Martynas,


We worked on that problem in [0] and used a merging strategy to consolidate 
entities.
In [1] you find a more detailed description with a screenshot [2], how this was 
presented to the user.
In essence, the user saw that the resulting entity was merged from several 
separate co-references and could open a drop-down to see the individual 
entities and their sources.


Cheers
Daniel



[0] 
http://www.aifb.kit.edu/images/9/90/82180161-federated-entity-search-using-on-the-fly-consolidation.pdf

[1] http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/2938247

[2] direct link to page 171 of [1]
https://books.google.de/books?id=umg4AwAAQBAJ=PR1=cp4-zPB_HH=info%3AC0NKXwkks_gJ%3Ascholar.google.com=PA171#v=onepage=false


--
Dr. Daniel Herzig
SearchHaus GmbH

GraphScope - the smart graphsearch engine
https://graphscope.io


> On 08.06.2016, at 14:55, Martynas Jusevičius  wrote:
> 
> Mikel, a lot of them do, but they are not required to. Both
> datasources work as expected, it is only when trying to combine both
> of them that one runs into this situation.
> 
> I agree that each of the descriptions could go into separate named
> graphs, where the graph name could be the source URI. That is why I
> mentioned quads.
> 
> Alasdair, with provenance do you mean PROV? I'm afraid that it is not
> available in the general case. HTTP headers could possibly be used to
> extract Last-Modified dates etc. But according to RDF semantics, isn't
> it the case that assertions are never removed? So I think it would be
> wrong to ignore the "older" description -- or any "other" description
> in general.
> 
> On Wed, Jun 8, 2016 at 2:31 PM, Mikel Egaña Aranguren
>  wrote:
>> Hi Martynas;
>> 
>> I thought that the majority of Linked Data servers work like Pubby, i.e.,
>> they serve Linked Data resources by doing a DESCRIBE on a Triple Store,
>> therefore serving the same triples. But it seems like you have encountered
>> the opposite (Different triples served) in many systems, do you have data on
>> how prevalent this issue is?
>> 
>> Cheers
>> 
>> 2016-06-08 14:06 GMT+02:00 Martynas Jusevičius :
>>> 
>>> Hey all,
>>> 
>>> we are developing software that consumes data both from Linked Data
>>> and SPARQL endpoints.
>>> 
>>> Most of the time, these technologies complement each other. We've come
>>> across an issue though, which occurs in situations where RDF
>>> description of the same resources is available using both of them.
>>> 
>>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
>>> as an example. Its RDF description is available in at least 2
>>> locations:
>>> - on a SPARQL endpoint:
>>> 
>>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
>>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>>> 
>>> These descriptions could be identical (I haven't checked), but it is
>>> more likely than not that they're out of sync, complementary, or
>>> possibly even contradicting each other, if reasoning is considered.
>>> 
>>> If a software agent has access to both the SPARQL endpoint and Linked
>>> Data resource, what should it consider as the resource description?
>>> There are at least 3 options:
>>> 1. prioritize SPARQL description over Linked Data
>>> 2. prioritize Linked Data description over SPARQL
>>> 3. merge both descriptions
>>> 
>>> I am leaning towards #3 as the sensible solution. But then I think the
>>> end-user should be informed which part of the description came from
>>> which source. This would be problematic if the descriptions are
>>> triples only, but should be doable with quads. That leads to another
>>> problem however, that both LD and SPARQL responses are under-specified
>>> in terms of quads.
>>> 
>>> What do you think? Maybe this is a well-known issue, in which case
>>> please enlighten me with some articles :)
>>> 
>>> 
>>> Martynas
>>> atomgraph.com
>>> @atomgraphhq
>>> 
>> 
>> 
>> 
>> --
>> Mikel Egaña Aranguren, Ph.D.
>> 
>> http://mikeleganaaranguren.com
>> 
>> 
> 
> 



Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Paul Houle
The vanilla RDF answer is that the data gathering module ought to pack all
of the graphs it got into named graphs that are part of a data set and then
pass that towards the consumer.

You can union the named graphs for a primitive but effective kind of
"merge" or put in some module downstream that composites the graphs in some
arbitrary manner,  such as something that converts statements about people
to foaf: vocabulary to produce enough graph that would be piped downstream
to a foaf: consumer for instance.

The named graphs give you sufficient anchor points to fill up another
dataset with metadata about what happened in the processing process so you
can follow "who is responsible for fact X?" past the initial data
transformations.

On Wed, Jun 8, 2016 at 8:29 AM, Gray, Alasdair J G 
wrote:

> Hi
>
> Option 3 seems sensible, particularly if you keep them in separate graphs.
>
> However shouldn’t you consider the provenance of the sources and
> prioritise them on how recent they were updated?
>
> Alasdair
>
> On 8 Jun 2016, at 13:06, Martynas Jusevičius 
> wrote:
>
> Hey all,
>
> we are developing software that consumes data both from Linked Data
> and SPARQL endpoints.
>
> Most of the time, these technologies complement each other. We've come
> across an issue though, which occurs in situations where RDF
> description of the same resources is available using both of them.
>
> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
> as an example. Its RDF description is available in at least 2
> locations:
> - on a SPARQL endpoint:
>
> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>
> These descriptions could be identical (I haven't checked), but it is
> more likely than not that they're out of sync, complementary, or
> possibly even contradicting each other, if reasoning is considered.
>
> If a software agent has access to both the SPARQL endpoint and Linked
> Data resource, what should it consider as the resource description?
> There are at least 3 options:
> 1. prioritize SPARQL description over Linked Data
> 2. prioritize Linked Data description over SPARQL
> 3. merge both descriptions
>
> I am leaning towards #3 as the sensible solution. But then I think the
> end-user should be informed which part of the description came from
> which source. This would be problematic if the descriptions are
> triples only, but should be doable with quads. That leads to another
> problem however, that both LD and SPARQL responses are under-specified
> in terms of quads.
>
> What do you think? Maybe this is a well-known issue, in which case
> please enlighten me with some articles :)
>
>
> Martynas
> atomgraph.com
> @atomgraphhq
>
>
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science,
> School of Mathematical and Computer Sciences
> (Athena SWAN Bronze Award)
> Heriot-Watt University, Edinburgh UK.
>
> Email: a.j.g.g...@hw.ac.uk
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
>
>
>
>
>
>
>
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> science.
>
> The contents of this e-mail (including any attachments) are confidential.
> If you are not the intended recipient of this e-mail, any disclosure,
> copying, distribution or use of its contents is strictly prohibited, and
> you should please notify the sender immediately and then delete it
> (including any attachments) from your system.
>



-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254paul.houle on Skype   ontolo...@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/


Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275


Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Martynas Jusevičius
Mikel, a lot of them do, but they are not required to. Both
datasources work as expected, it is only when trying to combine both
of them that one runs into this situation.

I agree that each of the descriptions could go into separate named
graphs, where the graph name could be the source URI. That is why I
mentioned quads.

Alasdair, with provenance do you mean PROV? I'm afraid that it is not
available in the general case. HTTP headers could possibly be used to
extract Last-Modified dates etc. But according to RDF semantics, isn't
it the case that assertions are never removed? So I think it would be
wrong to ignore the "older" description -- or any "other" description
in general.

On Wed, Jun 8, 2016 at 2:31 PM, Mikel Egaña Aranguren
 wrote:
> Hi Martynas;
>
> I thought that the majority of Linked Data servers work like Pubby, i.e.,
> they serve Linked Data resources by doing a DESCRIBE on a Triple Store,
> therefore serving the same triples. But it seems like you have encountered
> the opposite (Different triples served) in many systems, do you have data on
> how prevalent this issue is?
>
> Cheers
>
> 2016-06-08 14:06 GMT+02:00 Martynas Jusevičius :
>>
>> Hey all,
>>
>> we are developing software that consumes data both from Linked Data
>> and SPARQL endpoints.
>>
>> Most of the time, these technologies complement each other. We've come
>> across an issue though, which occurs in situations where RDF
>> description of the same resources is available using both of them.
>>
>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
>> as an example. Its RDF description is available in at least 2
>> locations:
>> - on a SPARQL endpoint:
>>
>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>>
>> These descriptions could be identical (I haven't checked), but it is
>> more likely than not that they're out of sync, complementary, or
>> possibly even contradicting each other, if reasoning is considered.
>>
>> If a software agent has access to both the SPARQL endpoint and Linked
>> Data resource, what should it consider as the resource description?
>> There are at least 3 options:
>> 1. prioritize SPARQL description over Linked Data
>> 2. prioritize Linked Data description over SPARQL
>> 3. merge both descriptions
>>
>> I am leaning towards #3 as the sensible solution. But then I think the
>> end-user should be informed which part of the description came from
>> which source. This would be problematic if the descriptions are
>> triples only, but should be doable with quads. That leads to another
>> problem however, that both LD and SPARQL responses are under-specified
>> in terms of quads.
>>
>> What do you think? Maybe this is a well-known issue, in which case
>> please enlighten me with some articles :)
>>
>>
>> Martynas
>> atomgraph.com
>> @atomgraphhq
>>
>
>
>
> --
> Mikel Egaña Aranguren, Ph.D.
>
> http://mikeleganaaranguren.com
>
>



Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Mikel Egaña Aranguren
Hi Martynas;

I thought that the majority of Linked Data servers work like Pubby, i.e.,
they serve Linked Data resources by doing a DESCRIBE on a Triple Store,
therefore serving the same triples. But it seems like you have encountered
the opposite (Different triples served) in many systems, do you have data
on how prevalent this issue is?

Cheers

2016-06-08 14:06 GMT+02:00 Martynas Jusevičius :

> Hey all,
>
> we are developing software that consumes data both from Linked Data
> and SPARQL endpoints.
>
> Most of the time, these technologies complement each other. We've come
> across an issue though, which occurs in situations where RDF
> description of the same resources is available using both of them.
>
> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
> as an example. Its RDF description is available in at least 2
> locations:
> - on a SPARQL endpoint:
>
> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>
> These descriptions could be identical (I haven't checked), but it is
> more likely than not that they're out of sync, complementary, or
> possibly even contradicting each other, if reasoning is considered.
>
> If a software agent has access to both the SPARQL endpoint and Linked
> Data resource, what should it consider as the resource description?
> There are at least 3 options:
> 1. prioritize SPARQL description over Linked Data
> 2. prioritize Linked Data description over SPARQL
> 3. merge both descriptions
>
> I am leaning towards #3 as the sensible solution. But then I think the
> end-user should be informed which part of the description came from
> which source. This would be problematic if the descriptions are
> triples only, but should be doable with quads. That leads to another
> problem however, that both LD and SPARQL responses are under-specified
> in terms of quads.
>
> What do you think? Maybe this is a well-known issue, in which case
> please enlighten me with some articles :)
>
>
> Martynas
> atomgraph.com
> @atomgraphhq
>
>


-- 
Mikel Egaña Aranguren, Ph.D.

http://mikeleganaaranguren.com


Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Gray, Alasdair J G
Hi

Option 3 seems sensible, particularly if you keep them in separate graphs.

However shouldn’t you consider the provenance of the sources and prioritise 
them on how recent they were updated?

Alasdair

On 8 Jun 2016, at 13:06, Martynas Jusevičius 
> wrote:

Hey all,

we are developing software that consumes data both from Linked Data
and SPARQL endpoints.

Most of the time, these technologies complement each other. We've come
across an issue though, which occurs in situations where RDF
description of the same resources is available using both of them.

Lest take a resource http://data.semanticweb.org/person/andy-seaborne
as an example. Its RDF description is available in at least 2
locations:
- on a SPARQL endpoint:
http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
- as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf

These descriptions could be identical (I haven't checked), but it is
more likely than not that they're out of sync, complementary, or
possibly even contradicting each other, if reasoning is considered.

If a software agent has access to both the SPARQL endpoint and Linked
Data resource, what should it consider as the resource description?
There are at least 3 options:
1. prioritize SPARQL description over Linked Data
2. prioritize Linked Data description over SPARQL
3. merge both descriptions

I am leaning towards #3 as the sensible solution. But then I think the
end-user should be informed which part of the description came from
which source. This would be problematic if the descriptions are
triples only, but should be doable with quads. That leads to another
problem however, that both LD and SPARQL responses are under-specified
in terms of quads.

What do you think? Maybe this is a well-known issue, in which case
please enlighten me with some articles :)


Martynas
atomgraph.com
@atomgraphhq


Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: a.j.g.g...@hw.ac.uk
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair











Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses 
and students across the entire globe we span the world, delivering innovation 
and educational excellence in business, engineering, design and science.

The contents of this e-mail (including any attachments) are confidential. If 
you are not the intended recipient of this e-mail, any disclosure, copying, 
distribution or use of its contents is strictly prohibited, and you should 
please notify the sender immediately and then delete it (including any 
attachments) from your system.


Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Martynas Jusevičius
Hey all,

we are developing software that consumes data both from Linked Data
and SPARQL endpoints.

Most of the time, these technologies complement each other. We've come
across an issue though, which occurs in situations where RDF
description of the same resources is available using both of them.

Lest take a resource http://data.semanticweb.org/person/andy-seaborne
as an example. Its RDF description is available in at least 2
locations:
- on a SPARQL endpoint:
http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
- as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf

These descriptions could be identical (I haven't checked), but it is
more likely than not that they're out of sync, complementary, or
possibly even contradicting each other, if reasoning is considered.

If a software agent has access to both the SPARQL endpoint and Linked
Data resource, what should it consider as the resource description?
There are at least 3 options:
1. prioritize SPARQL description over Linked Data
2. prioritize Linked Data description over SPARQL
3. merge both descriptions

I am leaning towards #3 as the sensible solution. But then I think the
end-user should be informed which part of the description came from
which source. This would be problematic if the descriptions are
triples only, but should be doable with quads. That leads to another
problem however, that both LD and SPARQL responses are under-specified
in terms of quads.

What do you think? Maybe this is a well-known issue, in which case
please enlighten me with some articles :)


Martynas
atomgraph.com
@atomgraphhq