Re: Can we lower the LD entry cost please (part 1)?

Richard Cyganiak Sun, 08 Feb 2009 18:01:41 -0800


Hugh,

An important and interesting issue, thanks for raising it, and thanksalso to everyone else who contributed to this thread.

I tend to agree: A search function that allows looking for resourcesby name greatly increases the usefulness of any dataset, and providingsuch a function is always a good idea.

Let me ask you something, Hugh: Now that you've raised awareness ofthe issue, can you propose some concrete steps that we could take toimprove the situation? Shall we review the datasets out there and flagthose without search? Shall we write up a blog post or wiki page?Something else?

I want to point out that creating such a site search can be verysimple for the dataset publisher. For example, at the old Berlin DBLPdataset [1], you will find a name search on the homepage. This was alast-minute hack, implemented in an hour using a pageful ofJavascript. It works by asking a SPARQL query to the dataset's SPARQLendpoint via AJAX, and redirecting to the best result. Certainly notthe best search function you've ever seen, but really simple... Ifyour dataset wraps a triple store or a relational database or a webAPI, then you almost certainly can use the search functions providedby the store/DB/API to implement this, and I would be surprised if ittakes more than half a day.

Another example to which I've contributed, and which I like quitemuch, is the search of the RDF book mashup [2], which works bywrapping the appropriate method of the Amazon web service API. Thesearch results are also available as RDF (find it via autodiscoverylinks).

Bradley's mention of RDFa is worth highlighting: In an RDFa-enabledwebsite, the local site search, which is probably already available,automatically doubles as a search for URIs. This is one of the manyreasons why I'm becoming an RDFa fanboy -- it makes us create goodlinked data sites simply by following dusty old good practices forwebsite design and deployment, such as providing site search!

Finally, allow me to be a bit smirky and quote below from an email Isent to this list 14 months ago. In it, I recount similar frustrationsin finding entry points into a recently announced dataset -- RKBExplorer. It's good to see that this site has improved a lot since,but it's maybe a bit discouraging that we still face the same generalproblems more than a year later... Anyways, enjoy! ;-)

Next, let's talk about concrete steps that we can take to improve thesituation.


Best,
Richard

[1] http://www4.wiwiss.fu-berlin.de/dblp/
[2] http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/#search


On 7 Nov 2007, at 21:46, Richard Cyganiak wrote:

Hugh,
This looks like it could be an awesome resource. Unfortunately Ididn't have much luck getting any kind of data back from the services.
The "browse" function doesn't do anything useful for me. I searchedfor a wide variety of terms, including "the", "a" and "2003" in thefirst ten or so datasets, including the one called Citeseer andDBLP. No results. What am I supposed to put into the search box?
I also tried to explore the datasets using SPARQL queries. I startedwith queries such as
  SELECT DISTINCT ?class WHERE { ?x a ?class }
to learn about the vocabulary used in the dataset. These queriesreturn some results on some of the datasets (they time out onothers), but clicking any of the results consistently showed a pagewith zero results. Same for opening in an RDF browser.
So in fact, despite honestly trying, the only way I could get anyreal data back from the services was by using the four example URIsprovided at www.rkbexplorer.com .
Obviously a lot of work went into this. It's a shame that it's sohard to make any use of it because the last 5% are missing.
What are those last 5%?
1. A brief description of what each dataset actually is, and whatsort of data it contains. The currently available information (whoprovided the data and some triple counts) are not enough.
2. A bunch of representative example URIs for each dataset.
3. A bunch of representative and interesting SPARQL queries againsteach dataset.
4. If possible, a note on what vocabulary (classes and properties)are used in each dataset. This would greatly simplify SPARQLing thedatasets.
5. You should think really hard about natural navigation entrypoints into the datasets. Is there any natural root from whicheverything can be accessed? Is there a category system or classhierarchy that one can navigate along to find interesting stuff?
6. You should consider adding a few domain-specific searchfunctions, such as the simple Find Yourself function provided at http://dblp.l3s.de/d2r/.
I'm a bit frustrated because this looks like an amazingly greatresource, but I can't actually get any clear feeling for its scopeor quality or contents. This feels like exploring a pitch black roomwhile wearing boxing gloves.
I'm very hopeful that you can greatly improve this experience withlittle effort.
Thanks a lot,
Richard







On 7 Feb 2009, at 13:23, Hugh Glaser wrote:

My proposal:
*We should not permit any site to be a member of the Linked Datacloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*

Rationale:
One aspect of our Linking Data (not to mention our Linking OpenData) worldis that we want people to link to our data - that is, I havepublished somestuff about something, with a URI, and I want people to be able touse that
URI.
So my question to you, the publisher, is: "How easy is it for me tofind the
URI your users want?"

My experience suggests it is not always very easy.
What is required at the minimum, I suggest, is a text search, sothat if I
have a (boring string version of a) name that refers in my mind to
something, I can hope to find an (exciting Linked Data) URI of thatthing.
I call this a projection from the Web to the Semantic Web.
rdfs:label or equivalent usually provides the other one.

At the risk of being seen as critical of the amazing efforts of all my
colleagues (if not also myself), this is rarely an easy thing to do.

Some recent experiences:
OpenCalais: as in my previous message on this list, I tried hard tofind a
URI for Tim, but failed.
dbtune: Saw a Twine message about dbtune, trundled over there, andtried to
find a URI for a Telemann, but failed.
dbpedia: wanted Tim again. After clicking on a few web pages, noneof whichseemed to provide a search facility, I resorted to my usual method:-look it
up in wikipedia and then hack the URI and hope it works in dbpedia.
(Sorry to name specific sites, guys, but I needed a few examples.
And I am only asking for a little more, so that the fruits of youramazing
labours can be more widely appreciated!)
wordnet: [2] below
So I have access to Linked Data sites that I know (or at leaststrongly
suspect) have URIs I might want, but I can't find them.
How on earth do we expect your average punter to join this world?

What have I missed?
Searching, such as Sindice: Well yes, but should I really have to gooff toa search engine to find a dbpedia URI? And when I look up "Telemanndbtune"I don't get any results. And I wanted the dbtune link, not someother link.Did I miss some links on web pages? Quite probably, but the basicproblem
still stands.
SPARQL: Well, yes. But we cannot seriously expect our users toformulate aSPARQL query simply to find out the dbpedia URI for Tim. What is theregexp
I need to put in? (see below [1])
A foaf file: Well Tim's dbpedia URI is probably in his foaf file(althoughpossibly there are none of Tim's URIs in his foaf file), if I canactuallyfind the file; but for some reason I can't seem to find Telemann'sfoaf
file.
If you are still doubting me, try finding a URI for Telemann indbpediawithout using an external link, just by following stuff from thehome page.I managed to get a Telemann by using SPARQL without a regexp (ittimes out
on any regexp), but unfortunately I get the asteroid.

Again, my proposal:
*We should not permit any site to be a member of the Linked Datacloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*
Otherwise we end up in a silo, and the world passes us by.

Very best
Hugh
[And since we have to take our own medicine, I have added a "Justsearch"
box right at the top level of all the rkbexplorer.com domains, such as
http://wordnet.rkbexplorer.com/ ]


[1]
Dbtune finding of Telemann:
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "Telemann$") }

I tried
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "telemann$", "i") }
first, but got no results - not sure why.

[2]
<rant>
I cannot believe just how frustrating this stuff can be when youreally try
to use it.
Because I looked at Sindice for telemann, I know that it is a word in
wordnet ( http://sindice.com/search?q=Telemann reports loads of
http://wordnet.rkbexplorer.com/ links).
Great, he thinks, I can get a wordnet link from a "proper" wordnetpublisher
(ie not me).
Goes to
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
to find wordnet.
The link there is dead.
Strips off the last bit, to get to the home princeton wordnet page,and
clicks on the browser link I find - also dead.
Go back and look on the
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
s page, and find the link to http://esw.w3.org/topic/WordNet , butthat
doesn't help.
So finally, I do the obvious - google "wordnet rdf".
Of course I get lots of pages saying how available it is, and howexcitingit is that we have it, and how it was produced; and somewhere inthere I
find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
Almost unable to contain myself with excitement, I click on the linkto finda text box, and with trembling hands I type "Telemann" and clicksubmit.If I show you what I got, you can come some way to imagining mydevastation:
"Using org.apache.xerces.parsers.SAXParser
Exception net.sf.saxon.trans.DynamicError:org.xml.sax.SAXParseException:
White spaces are required between publicId and systemId.
org.xml.sax.SAXParseException: White spaces are required betweenpublicId
and systemId."

Does the emperor have any clothes at all?
</rant>

Re: Can we lower the LD entry cost please (part 1)?

Reply via email to