Re: Size a linked open data set

2016-07-19 Thread Gray, Alasdair J G
Hi

You may be interested in the rich dataset statistics that are reported as part 
of the Health Care and Life Sciences Community Profile for dataset 
descriptions; these extend the properties given in the VoID vocabulary.
https://www.w3.org/TR/hcls-dataset/#s6_6
The linked section gives a description of the statistic reported and the SPARQL 
query that is used to generate the values.

Best regards,

Alasdair

On 13 Jul 2016, at 17:05, Jean-Claude Moissinac 
>
 wrote:

Many thanks John for the elegant solution.

My perception is that
select count(distinct ?r) where { ?r ?p ?l }
is semantically equivalent to
select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }
It gives the count of distinct nodes in the graph, so the difference is only a 
result of the internal implementation. So, it seems necessary to know a lot 
about implementation to know how to get the result.
Am I wrong?



--
Jean-Claude Moissinac


2016-07-06 15:55 GMT+02:00 John Walker 
>:
How about reformulating as:

select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }

Which gives a result of 10515620 resources [1].

Regards,
John

[1] 
http://fr.dbpedia.org/sparql?default-graph-uri==select+%28count%28%3Fs%29+as+%3Fc%29+where+%7B+select+distinct+%3Fs+where+%7B+%3Fs+%3Fp+%5B%5D%7D+%7D=text%2Fhtml=0=on


-Original Message-
From: Hugh Williams 
[mailto:hwilli...@openlinksw.com]
Sent: Wednesday, July 06, 2016 3:15 PM
To: Jean-Claude Moissinac 
>
Cc: public-lod >
Subject: Re: Size a linked open data set

Hi Jean-Claude,

The "select count(distinct ?r) where { ?r ?p ?l }” query is expensive in terms 
of database resources and would result in a huge hash table being creating to 
try and service it which is causing it to timeout based on the settings on the 
instance by whoever maintains it.

On http://dbpedia.org/sparql the original canonical English DBpedia endpoint 
OpenLink Software hosts, we provide preloaded VOID datasets, such that they 
don’t have to be queried each time, see http://dbpedia.org/void/Dataset , but 
the French DBpedia instance does not appear to have this ie 
http://fr.dbpedia.org/void/Dataset

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

> On 6 Jul 2016, at 12:49, Jean-Claude Moissinac 
> >
>  wrote:
>
> Hello
>
> In my work, I need to know the number of distinct resources in a dataset.
> For example, with dbpedia-fr, I'm trying
> select count(distinct ?r) where { ?r ?p ?l }
>
> And I'm always getting a timeout error message
> While with
> select count(?r) where { ?r ?p ?l }
> I'm getting
> 185404575
>
> Is it a good way to know about such size?
>
> --
> Jean-Claude Moissinac
>



Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: a.j.g.g...@hw.ac.uk
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair












Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses 
and students across the entire globe we span the world, delivering innovation 
and educational excellence in business, engineering, design and the physical, 
social and life sciences.

The contents of this e-mail (including any attachments) are confidential. If 
you are not the intended recipient of this e-mail, any disclosure, copying, 
distribution or use of its contents is strictly prohibited, and you should 
please notify the sender immediately and then delete it (including any 
attachments) from your system.


Re: Dealing with distributed nature of Linked Data and SPARQL

2016-06-08 Thread Gray, Alasdair J G
Hi

Option 3 seems sensible, particularly if you keep them in separate graphs.

However shouldn’t you consider the provenance of the sources and prioritise 
them on how recent they were updated?

Alasdair

On 8 Jun 2016, at 13:06, Martynas Jusevičius 
> wrote:

Hey all,

we are developing software that consumes data both from Linked Data
and SPARQL endpoints.

Most of the time, these technologies complement each other. We've come
across an issue though, which occurs in situations where RDF
description of the same resources is available using both of them.

Lest take a resource http://data.semanticweb.org/person/andy-seaborne
as an example. Its RDF description is available in at least 2
locations:
- on a SPARQL endpoint:
http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
- as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf

These descriptions could be identical (I haven't checked), but it is
more likely than not that they're out of sync, complementary, or
possibly even contradicting each other, if reasoning is considered.

If a software agent has access to both the SPARQL endpoint and Linked
Data resource, what should it consider as the resource description?
There are at least 3 options:
1. prioritize SPARQL description over Linked Data
2. prioritize Linked Data description over SPARQL
3. merge both descriptions

I am leaning towards #3 as the sensible solution. But then I think the
end-user should be informed which part of the description came from
which source. This would be problematic if the descriptions are
triples only, but should be doable with quads. That leads to another
problem however, that both LD and SPARQL responses are under-specified
in terms of quads.

What do you think? Maybe this is a well-known issue, in which case
please enlighten me with some articles :)


Martynas
atomgraph.com
@atomgraphhq


Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: a.j.g.g...@hw.ac.uk
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair











Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses 
and students across the entire globe we span the world, delivering innovation 
and educational excellence in business, engineering, design and science.

The contents of this e-mail (including any attachments) are confidential. If 
you are not the intended recipient of this e-mail, any disclosure, copying, 
distribution or use of its contents is strictly prohibited, and you should 
please notify the sender immediately and then delete it (including any 
attachments) from your system.


Re: Astronomy meets Semantic Web/Linked Data?

2016-03-31 Thread Gray, Alasdair J G
Hi Aidan, Ismael

You should look into what the IVOA semantics working group are doing
http://wiki.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics

Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science
Herriot-Watt University, Edinburgh

www.macs.hw.ac.ukwww.macs.hw.ac.uk/~ajg33>/~ajg33www.macs.hw.ac.uk/~ajg33>


From: Aidan Hogan
Sent: Thursday 31 March 06:07
Subject: Astronomy meets Semantic Web/Linked Data?
To: LOD List, Semantic Web IG
Cc: ismael alvarez

Hi all, [Apologies for cross-posting but I'm not sure which list fits best.] 
Ismael (cc'ed) is thinking about a possible masters topic in the intersection 
of his twin passions of Astronomy and Semantic Web/Linked Data. It seems our 
cousins in Astronomy have lots of problems coping with large amounts of diverse 
data, and we have the typical integration problems across different 
observatories, as well as questions of how to make data public in a reusable 
manner, and so forth. So trying to apply SW/LD methodologies to the area of 
Astronomy would seem to make a lot of sense. However, in Googling around, I 
could find very little if any work in this intersection, which I find a little 
puzzling. Hence I'm just looking for pointers to any works or groups or people 
or tools or resources or papers, etc., in the intersection of astronomy and 
SW/LD. It could be related to the use of RDF, ontologies, SPARQL, Linked Data, 
etc., for astronomical data. (The only thing I'm currently aware of is the 
presence of some descriptions in knowledge-bases like DBpedia and Wikidata of 
well-known galaxies and other general-interest resources.) We would be very 
grateful for any pointers along those lines. Thanks in advance! Aidan, Ismael


Call for Participation: SLiDInG5

2015-06-19 Thread Gray, Alasdair J G
The 5th Scottish Linked Data Interest Group Workshop (SLiDInG5) will take place 
in Edinburgh on Thursday 9 July 2015, co-located with the 30th British 
International Conference on Databases.

The workshop is free to attend and provides a forum for highlighting the latest 
advances in Linked Data applications and research in Scotland. Confirmed 
speakers include

  *   Gregor Boyd, Scottish Government
  *   Peter Christen, Australian National University
  *   Hiromichi Kobashi, Fujitsu Laboratories Ltd
  *   Paolo Pareti, University of Edinburgh

We are also planning a demo session to allow for the display of various Linked 
Data applications and a panel discussion to finish the day off.

For more details and to register to attend please visit the workshop webpage
https://sites.google.com/site/scottishlinkeddata/events/sliding5

I look forward to welcoming you all in Edinburgh,

Alasdair

--
Alasdair J G Gray
Lecturer, Heriot-Watt University
Web: http://www.alasdairjggray.co.uk
ORCID: http://orcid.org/-0002-5711-4872
Twitter: @gray_alasdair
Telephone: +44 131 451 3429
Office: EM 1.39


Re: Quick Poll

2015-01-27 Thread Gray, Alasdair J G
I think you would find that the original dataset tends to be the 
void:subjectsTarget but with symmetric properties it doesn't really matter.

What you really need is a provenance model of how the data was generated.

Alasdair J G Gray
http://www.orcid.org/-0002-5711-4872

On 26 Jan 2015 07:51, Stian Soiland-Reyes soiland-re...@cs.manchester.ac.uk 
wrote:

I would have to check the VOIDs we consume, but I believe generally if you are 
example.comhttp://example.com, you try to publish linksets to other URIs with 
your example.com/*http://example.com/* as subjects - after all those are what 
you claim to be authorative for.

The choice of an asymmetric property may mean sacrificing this principle if 
there is no inverse, but often there is - for instance prov:specializationOf vs 
prov:generalizationOf.

In Open PHACTS we used to calculate the inverse direction of symmetrical 
properties at data loading time, but now we do that (and transitive identifier 
mapping) on the fly at query time.

It turns out that when you ask domain scientists, they might disagree with the 
owl:sameAs statements in one of the directions (!) - so this is now 
configurable per dataset.

On 25 Jan 2015 12:28, Hugh Glaser h...@glasers.orgmailto:h...@glasers.org 
wrote:
Thanks Stian and Alasdair,
Just going back to the original question for a moment, I’ll try another way of 
putting it for people who don’t have their own URIs.

When you want to create a set of links (of the sorts of properties you are 
talking about, but only the symmetric ones), you have often started with 
candidate URIs, and then found other URIs that have that relationship.
When you create the triples to record this valuable information, does the 
original candidate appear as
a) subject, b) object, or c) just whatever, or d) maybe you assert 2 triples 
both ways?

It’s a very qualitative and woolly question, I realise :-)
Thanks.


 On 25 Jan 2015, at 10:45, Stian Soiland-Reyes 
 soiland-re...@cs.manchester.ac.ukmailto:soiland-re...@cs.manchester.ac.uk 
 wrote:

 For properties you would need to use owl:equivalentProperty or 
 rdfs:subPropertyOf in either direction.

 SKOS is very useful here as an alternative when the logic gets dirty due to 
 loose term definition.

 As an example see this SKOS mapping from PAV onto Dublin Core Terms (which 
 are notoriously underspecified and vague):

 http://www.jbiomedsem.com/content/4/1/37/table/T5

 http://www.jbiomedsem.com/content/4/1/37 (Results)

 Actual SKOS: http://purl.org/pav/mapping/dcterms

 Here we found SKOS as a nice way to do the mapping independently (and 
 justified) as the inferences from OWL make DC Term incompatible with any 
 causal provenance ontology like PROV and PAV.

 On 23 Jan 2015 17:59, Hugh Glaser 
 h...@glasers.orgmailto:h...@glasers.org wrote:
 Thanks, and thanks for all the answers so far.

  On 23 Jan 2015, at 16:23, Stian Soiland-Reyes 
  soiland-re...@cs.manchester.ac.ukmailto:soiland-re...@cs.manchester.ac.uk
   wrote:
 
  Not sure where you are going, but you are probably interested in
  linksets - as a way to package equivalence relations - typically in a
  graph of its own.
 Thanks - I have a lot of linksets :-)
 
  http://www.w3.org/TR/void/#describing-linksets
 
 
 
  To answer the questions:
 
  Q1: d) in subject, property, object, or multiple of those.
 I don’t understand where property comes in for using owl:sameAs (or whatever) 
 in stating equivalence between URIs, so I’ll read that as c)
 
 
  Q2: No. We already reuse existing vocabularies and external
  identifiers, and there could be a nested structure which is only
  indirectly connected to our URIs.
 I realise that this second question wasn’t as clear as it might have been.
 What I meant was concerned with the sameAs triples only (as was explicit for 
 Q1).
 So, to elaborate, if you have decided that:
 http://mysite.com/foo, http://dbpedia.org/resource/foo, 
 http://rdf.freebase.com/ns/m.05195d8
 are aligned (the same), then what do the triples describing that look like?
 In particular, do you have any that look like
 http://dbpedia.org/resource/foo owl:sameAs 
 http://rdf.freebase.com/ns/m.05195d8 .
 (or vice versa), or do you equivalent everything to a “mysite” URI?

 But I guess for OpenPHACTS this doesn’t apply, since I understand from what 
 you say below that you never mint a URI of your own where you know there is 
 an external one.
 Although it does beg the question, perhaps, of what you do when you alter 
 find equivalences.

 Best
 Hugh
 
  http://example.com/our/own pav:authoredBy
  http://orcid.org/-0001-9842-9718 .
  http://orcid.org/-0001-9842-9718 foaf:name Stian Soiland-Reyes .
 
  It's true you would also get the second triple from ORCID (remember
  content negotiation!), but it's very useful for presentation and query
  purposes to include these directly, e.g. in a VOID file.
 
  In most cases we do however not have any our URIs except for
  provenance statements. But perhaps Open PHACTS is 

Re: Quick Poll

2015-01-24 Thread Gray, Alasdair J G

On 23 Jan 2015, at 17:59, Hugh Glaser 
h...@glasers.orgmailto:h...@glasers.org wrote:

On 23 Jan 2015, at 16:23, Stian Soiland-Reyes 
soiland-re...@cs.manchester.ac.ukmailto:soiland-re...@cs.manchester.ac.uk 
wrote:

[snip]

But I guess for OpenPHACTS this doesn’t apply, since I understand from what you 
say below that you never mint a URI of your own where you know there is an 
external one.
Although it does beg the question, perhaps, of what you do when you alter find 
equivalences.

We don’t focus on the link predicate so much as the metadata associated with 
the linkset which tells us why the concepts in the two datasets are equated. 
For example, between two chemical databases we may have multiple linksets which 
use different properties for equivalence; one linkset may be because they have 
the same chemical structure and in another linkset we have a different set of 
links with the justification that they have the same chemical name. These are 
then used to satisfy different use cases. You can find a fuller description in

C. Batchelor, C. Brenninkmeijer, C. Chichester, M. Davies, D. Digles, I. 
Dunlop, C. T. Evelo, A. Gaulton, C. Goble, A. J. G. Gray, P. Groth, L. Harland, 
K. Karapetyan, A. Loizou, J. P. Overington, S. Pettifer, J. Steele, R. Stevens, 
V. Tkachenko, A. Waagmeester, A. Williams, and E. L. Willighagen, “Scientific 
lenses to support multiple views over linked chemistry data,” in ISWC In Use, 
2014, pp. 1–16.


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: a.j.g.g...@hw.ac.ukmailto:a.j.g.g...@hw.ac.uk
Web: http://www.alasdairjggray.co.uk
ORCID: http://orcid.org/-0002-5711-4872
Telephone: +44 131 451 3429
Twitter: @gray_alasdair








- 
We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.



Deadline extension: ISWC'14 Workshop on Context, Interpretation and Meaning (CIM2014)

2014-07-07 Thread Gray, Alasdair J G
Deadline extended to 14 July 2014

Apologies for cross-posting


---


International Workshop on Context, Interpretation and Meaning (CIM2014)

http://www.macs.hw.ac.uk/~fm206/cim14/

19 or 20 October 2014

Riva del Garda, Trentino, Italy

Collocated with the 13th International Semantic Web Conference (ISWC2014).





One size does not fit all use cases when inter-relating real-world
datasets. For example should data about two cities be matched on their
geographic coverage, name or regional government boundary? It depends upon
the use to which the data will be put. For emergency response you could
imagine needing all available data while for longitudinal studies more
precision would be required. Similar issues arise in life sciences when
relating genes, proteins and nucleotides, or chemistry when matching
compounds.


Ontology alignment and linked data have to date focused on generating a
mapping for a given application. This workshop will explore the potential
for reusing, reinterpreting and contextualising mappings, and on whether or
not they can be effectively crowd-sourced. To do so, the way in which the
mapping has been generated needs to be understood so that the implied
meaning of the operational equivalence can be interpreted.


CIM2014 aims to bring together different communities: those who create
mappings with those who rely on them for developing novel applications. We
are seeking to foster discussion and debate between the communities through
face-to-face discussion in breakout groups during the workshop. This will
be seeded through short presentations about real-world challenges and
state-of-the-art solutions. The goal is to produce a common vision of the
future for the communities.


Topics of Interest

   - Dynamic linking and matching
   - Collective Interpretation of linked data
   - Background knowledge in matching
   - Crowd-sourcing structured data
   - Incentive structures for creating and mapping data
   - Machine-learning over incomplete structured and semi-structured data
   - Provenance of matches and data
   - Justification of mappings
   - Approximate matching
   - Informed decision support


Workshop Format and Location

CIM2014 is a half-day workshop co-located with ISWC 2014. It  focuses on
community building and discussions.  It will involve:

   - Short talks: papers will be limited to 12 minutes with 3 minutes for
   questions.
   - Demonstration session: showcasing existing tools and approaches
   - Break-out groups formed sorted according to interest.
   - Feedback from break-out groups and open discussion
   - Discussions of future vision for the community


Paper Submission Instructions

We solicit papers of 5-12 pages that report on existing tools, work in
progress or future visions. Papers should aim to encourage debate and
discussion about the topics of the workshop.  The best research papers will
be nominated for publication in the Journal of Data Semantics.  Position
papers are also very welcome.


All papers have to be submitted electronically via the EasyChair conference
submission system https://www.easychair.org/conferences/?conf=cim20140.


All submissions must be in English. Submissions must be in PDF formatted in
the style of the Springer Publications format for Lecture Notes in Computer
Science (LNCS). For details on the LNCS style, see Springer's Author
Instructions (http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
).  CIM-2014 submissions are not anonymous.


At least one author of each accepted paper must register for the workshop
and the ISWC conference and present the paper at the workshop.  The
workshop proceedings will be published online.



Important Dates

Paper deadline: 14 July 2014

Notifications: 30 July 2014

Camera-ready version: 12 August 2014

Workshop: 19 or 20 October 2014



Chairs

   - Alasdair J G Gray, Heriot-Watt University, UK
   - Harry Halpin, W3C/ L'Institut de recherche et d'innovation du Centre
   Pompidou (IRI)/MIT
   - Fiona McNeill, Heriot-Watt University, UK


Programme Committee


   - Krisztian Balog, University of Amsterdam
   - Alan Bundy, University of Edinburgh
   - Vinay Chaudhri, SRI
   - Michelle Cheatham, Wright State University
   - James Cheney, University of Edinburgh
   - Oscar Corcho, Universidad Politécnica de Madrid
   - Eraldo Fernandes, Pontifícia Universidade Católica do Rio de Janeiro
   - Fausto Giunchiglia, University of Trento
   - Paul Groth, VU Amsterdam
   - Sajjad Hussain, INSERM, Paris, France
   - Shuai Ma, Beihang University
   - Alun Preece, Cardiff University
   - David Robertson, University of Edinburgh
   - Robert Stevens, University of Manchester
   - Frank van Harmelen, VU Amsterdam
   - Peter Winstanley, Scottish Government


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: