On 8/23/12 12:15 AM, Richard Wallis wrote:
Hi Karen,

Those that want to play with this data in their own triplestore may be
interested in my post about doing that myself:  Putting WorldCat Data Into
A Triple 
Store<http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/>

Thanks. Now I know what I was doing wrong in my SPARQL queries -- left off the <>. Back I go to try again.



I am intrigued by your identification of punctuation differences, seems
like one of the outputs has been through an extra cleanup step.  I will
find out.

My guess is that the person working with the code noticed the ending punctuation and thought "Now THAT'S stupid" and removed it. Good on them!

kc


On the creation of multiple identifiers for each instance of a place name -
this is a symptom of the way the experimental data is created using what
are called blank nodes.  Ideally we would have minted a URI for each unique
place and linked all references to it.  Unfortunately, this was not easily
achievable, as part of the experiment, on top of production WorldCat.
  Solving issues such as this are on the agenda as our work in this area
evolves.

Keep the comments coming - they are very helpful.

~Richard.

On 23 August 2012 00:56, Karen Coyle <[email protected]> wrote:

On 8/22/12 2:56 PM, Richard Wallis wrote:

Hi Karen,

I was not ignoring you previous question about where, in Marc terms, data
was coming from.  I need to talk with someone who was in the core of the
processing that produces the data.  Unfortunately I am currently being
thwarted by vacations.

Richard, I understand, and apologize if I appeared to be pushing too hard.
In my own experience, requests for documentation are met with groans,
especially by folks who'd rather be "doing something useful," like writing
code. Unfortunately, it really helps to explain what you've done.

I think I've solved the question of where the place of publication comes
from: 260 $a. The differences between the Web version and the triples
version is in punctuation. I'm still looking at examples, but it's a slog
since I'm re-creating records in the triples file with my minimal knowledge
of "grep" -- a hammer, but the best darned hammer there is. Here are some
examples:

#1
File:

<http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>>
<http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>>
_:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9
_:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9 <http://schema.org/name>
"New York"


Web: (Using RDFa It Firefox plugin [1])
<http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>>
a schema:Book;
       library:placeOfPublication [ a schema:Place;
          schema:name "New York :"@en ];

#2

File:
_:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://schema.org/name>
"Garden City, N.Y." .
_:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://www.w3.org/1999/02/22-*
*rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <
http://schema.org/Place> .
<http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>>
<http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>>
_:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 .

Web:
<http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>>
a schema:Book;
     library:holdingsCount "803"@en;
     library:oclcnum "524483"@en;
     library:placeOfPublication [ a schema:Place;
             schema:name "Garden City, N.Y.,"@en ];

Another piece of information is that each instance of a place of
publication string is given a new identity:

_:**AX2dX44931a01X3aX138a139ed19X3**aXX2dX600d <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX44a4d9f9X3aX138a132b9d9X3**aXX2dX4efe <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX44a4d9f9X3aX138a1378e1dX3**aXX2dX1d8d <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX45b46946X3aX138a139141cX3**aXX2dX7073 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4a6da202X3aX138a1387049X3**aXX2dX7bcc <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4b32d4b9X3aX138a1316a9bX3**aXX2dX5f92 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4b5c4da3X3aX138a135d400X3**aXX2dX515e <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4b93edacX3aX138a1314f3eX3**aXX2dX58e9 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4c810b47X3aX138a134150cX3**aXX2dX5b77 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX1677 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX23e1 <http://schema.org/name>
"Garden City, N.Y." .
_:**AX2dX52ad903aX3aX138a12d336bX3**aXX2dX5389 <http://schema.org/name>
"Garden City, N.Y." .


Where punctuation doesn't cloud the picture, these could eventually be
linked to:
   
http://id.loc.gov/authorities/**names/n50068040.html<http://id.loc.gov/authorities/names/n50068040.html>
and:
   
http://www.geonames.org/**5118226/garden-city.html<http://www.geonames.org/5118226/garden-city.html>

and in that way could have a shared identity.

kc

p.s. Richard and I are on a list with someone who has loaded the triples
into a database. I will ask if we can announce it here, and will also try
to figure out how to use the SPARQL endpoint and provide some examples, if
that is ok with the dc:creator of the database.

[1] javascript:location.href='http**://www.w3.org/2012/pyRdfa/**
extract?format=turtle&uri='+**escape(location.href)<http://www.w3.org/2012/pyRdfa/extract?format=turtle&uri='+escape(location.href)>


In the meantime, can you let me have a few examples of where you are
seeing
discrepancies between the download triples and the RDFa embedded in
WorldCat.org pages.

~Richard.

On 22 August 2012 19:08, Karen Coyle <[email protected]> wrote:

  Richard, I've run into yet another area where documentation would be
helpful. There are differences between the schema.org/RDFa that is
embedded in WorldCat data and the exported WorldCat triples in the file.
One of those differences happens to be the source of the place of
publication, if I am reading it right. So, again, a request for
documentation on the fields included and their MARC source.

Thanks,

kc

On 8/17/12 8:38 AM, Richard Wallis wrote:

  In case you missed the press release earlier this week.
You can now download a significant number of RDF triples describing the
most highly held 1.2 million resources in WorldCat.  Licensed under
ODC-BY.

I've posted more details on my blog:
http://dataliberate.com/2012/****08/get-yourself-a-linked-**data-**<http://dataliberate.com/2012/**08/get-yourself-a-linked-data-**>
piece-of-worldcat-to-play-****with/<http://dataliberate.com/**
2012/08/get-yourself-a-linked-**data-piece-of-worldcat-to-**play-with/<http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/>
~Richard.

  --
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet



--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet




--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Reply via email to