Hugh,

An important and interesting issue, thanks for raising it, and thanks also to everyone else who contributed to this thread.

I tend to agree: A search function that allows looking for resources by name greatly increases the usefulness of any dataset, and providing such a function is always a good idea.

Let me ask you something, Hugh: Now that you've raised awareness of the issue, can you propose some concrete steps that we could take to improve the situation? Shall we review the datasets out there and flag those without search? Shall we write up a blog post or wiki page? Something else?

I want to point out that creating such a site search can be very simple for the dataset publisher. For example, at the old Berlin DBLP dataset [1], you will find a name search on the homepage. This was a last-minute hack, implemented in an hour using a pageful of Javascript. It works by asking a SPARQL query to the dataset's SPARQL endpoint via AJAX, and redirecting to the best result. Certainly not the best search function you've ever seen, but really simple... If your dataset wraps a triple store or a relational database or a web API, then you almost certainly can use the search functions provided by the store/DB/API to implement this, and I would be surprised if it takes more than half a day.

Another example to which I've contributed, and which I like quite much, is the search of the RDF book mashup [2], which works by wrapping the appropriate method of the Amazon web service API. The search results are also available as RDF (find it via autodiscovery links).

Bradley's mention of RDFa is worth highlighting: In an RDFa-enabled website, the local site search, which is probably already available, automatically doubles as a search for URIs. This is one of the many reasons why I'm becoming an RDFa fanboy -- it makes us create good linked data sites simply by following dusty old good practices for website design and deployment, such as providing site search!

Finally, allow me to be a bit smirky and quote below from an email I sent to this list 14 months ago. In it, I recount similar frustrations in finding entry points into a recently announced dataset -- RKB Explorer. It's good to see that this site has improved a lot since, but it's maybe a bit discouraging that we still face the same general problems more than a year later... Anyways, enjoy! ;-)

Next, let's talk about concrete steps that we can take to improve the situation.

Best,
Richard

[1] http://www4.wiwiss.fu-berlin.de/dblp/
[2] http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/#search


On 7 Nov 2007, at 21:46, Richard Cyganiak wrote:
Hugh,

This looks like it could be an awesome resource. Unfortunately I didn't have much luck getting any kind of data back from the services.

The "browse" function doesn't do anything useful for me. I searched for a wide variety of terms, including "the", "a" and "2003" in the first ten or so datasets, including the one called Citeseer and DBLP. No results. What am I supposed to put into the search box?

I also tried to explore the datasets using SPARQL queries. I started with queries such as

  SELECT DISTINCT ?class WHERE { ?x a ?class }

to learn about the vocabulary used in the dataset. These queries return some results on some of the datasets (they time out on others), but clicking any of the results consistently showed a page with zero results. Same for opening in an RDF browser.

So in fact, despite honestly trying, the only way I could get any real data back from the services was by using the four example URIs provided at www.rkbexplorer.com .

Obviously a lot of work went into this. It's a shame that it's so hard to make any use of it because the last 5% are missing.

What are those last 5%?

1. A brief description of what each dataset actually is, and what sort of data it contains. The currently available information (who provided the data and some triple counts) are not enough.

2. A bunch of representative example URIs for each dataset.

3. A bunch of representative and interesting SPARQL queries against each dataset.

4. If possible, a note on what vocabulary (classes and properties) are used in each dataset. This would greatly simplify SPARQLing the datasets.

5. You should think really hard about “natural” navigation entry points into the datasets. Is there any natural “root” from which everything can be accessed? Is there a category system or class hierarchy that one can navigate along to find interesting stuff?

6. You should consider adding a few domain-specific search functions, such as the simple “Find Yourself” function provided at http://dblp.l3s.de/d2r/ .

I'm a bit frustrated because this looks like an amazingly great resource, but I can't actually get any clear feeling for its scope or quality or contents. This feels like exploring a pitch black room while wearing boxing gloves.

I'm very hopeful that you can greatly improve this experience with little effort.

Thanks a lot,
Richard






On 7 Feb 2009, at 13:23, Hugh Glaser wrote:


My proposal:
*We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*

Rationale:
One aspect of our Linking Data (not to mention our Linking Open Data) world is that we want people to link to our data - that is, I have published some stuff about something, with a URI, and I want people to be able to use that
URI.

So my question to you, the publisher, is: "How easy is it for me to find the
URI your users want?"

My experience suggests it is not always very easy.
What is required at the minimum, I suggest, is a text search, so that if I
have a (boring string version of a) name that refers in my mind to
something, I can hope to find an (exciting Linked Data) URI of that thing.
I call this a projection from the Web to the Semantic Web.
rdfs:label or equivalent usually provides the other one.

At the risk of being seen as critical of the amazing efforts of all my
colleagues (if not also myself), this is rarely an easy thing to do.

Some recent experiences:
OpenCalais: as in my previous message on this list, I tried hard to find a
URI for Tim, but failed.
dbtune: Saw a Twine message about dbtune, trundled over there, and tried to
find a URI for a Telemann, but failed.
dbpedia: wanted Tim again. After clicking on a few web pages, none of which seemed to provide a search facility, I resorted to my usual method:- look it
up in wikipedia and then hack the URI and hope it works in dbpedia.
(Sorry to name specific sites, guys, but I needed a few examples.
And I am only asking for a little more, so that the fruits of your amazing
labours can be more widely appreciated!)
wordnet: [2] below

So I have access to Linked Data sites that I know (or at least strongly
suspect) have URIs I might want, but I can't find them.
How on earth do we expect your average punter to join this world?

What have I missed?
Searching, such as Sindice: Well yes, but should I really have to go off to a search engine to find a dbpedia URI? And when I look up "Telemann dbtune" I don't get any results. And I wanted the dbtune link, not some other link. Did I miss some links on web pages? Quite probably, but the basic problem
still stands.
SPARQL: Well, yes. But we cannot seriously expect our users to formulate a SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp
I need to put in? (see below [1])
A foaf file: Well Tim's dbpedia URI is probably in his foaf file (although possibly there are none of Tim's URIs in his foaf file), if I can actually find the file; but for some reason I can't seem to find Telemann's foaf
file.

If you are still doubting me, try finding a URI for Telemann in dbpedia without using an external link, just by following stuff from the home page. I managed to get a Telemann by using SPARQL without a regexp (it times out
on any regexp), but unfortunately I get the asteroid.

Again, my proposal:
*We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*
Otherwise we end up in a silo, and the world passes us by.

Very best
Hugh

[And since we have to take our own medicine, I have added a "Just search"
box right at the top level of all the rkbexplorer.com domains, such as
http://wordnet.rkbexplorer.com/ ]


[1]
Dbtune finding of Telemann:
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "Telemann$") }

I tried
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "telemann$", "i") }
first, but got no results - not sure why.

[2]
<rant>
I cannot believe just how frustrating this stuff can be when you really try
to use it.
Because I looked at Sindice for telemann, I know that it is a word in
wordnet ( http://sindice.com/search?q=Telemann reports loads of
http://wordnet.rkbexplorer.com/ links).
Great, he thinks, I can get a wordnet link from a "proper" wordnet publisher
(ie not me).
Goes to
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
to find wordnet.
The link there is dead.
Strips off the last bit, to get to the home princeton wordnet page, and
clicks on the browser link I find - also dead.
Go back and look on the
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
s page, and find the link to http://esw.w3.org/topic/WordNet , but that
doesn't help.
So finally, I do the obvious - google "wordnet rdf".
Of course I get lots of pages saying how available it is, and how exciting it is that we have it, and how it was produced; and somewhere in there I
find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
Almost unable to contain myself with excitement, I click on the link to find a text box, and with trembling hands I type "Telemann" and click submit. If I show you what I got, you can come some way to imagining my devastation:
"Using org.apache.xerces.parsers.SAXParser
Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException:
White spaces are required between publicId and systemId.
org.xml.sax.SAXParseException: White spaces are required between publicId
and systemId."

Does the emperor have any clothes at all?
</rant>




Reply via email to