On 8/22/12 2:56 PM, Richard Wallis wrote:
Hi Karen,

I was not ignoring you previous question about where, in Marc terms, data
was coming from.  I need to talk with someone who was in the core of the
processing that produces the data.  Unfortunately I am currently being
thwarted by vacations.

Richard, I understand, and apologize if I appeared to be pushing too hard. In my own experience, requests for documentation are met with groans, especially by folks who'd rather be "doing something useful," like writing code. Unfortunately, it really helps to explain what you've done.

I think I've solved the question of where the place of publication comes from: 260 $a. The differences between the Web version and the triples version is in punctuation. I'm still looking at examples, but it's a slog since I'm re-creating records in the triples file with my minimal knowledge of "grep" -- a hammer, but the best darned hammer there is. Here are some examples:

#1
File:

<http://www.worldcat.org/oclc/43836713> <http://purl.org/library/placeOfPublication> _:AX2dX40d4c600X3aX138a12b56f9X3aXX2dX49b9 _:AX2dX40d4c600X3aX138a12b56f9X3aXX2dX49b9 <http://schema.org/name> "New York"


Web: (Using RDFa It Firefox plugin [1])
<http://www.worldcat.org/oclc/43836713> a schema:Book;
      library:placeOfPublication [ a schema:Place;
         schema:name "New York :"@en ];

#2

File:
_:A52eb8ca1X3aX138a1313c61X3aXX2dX7536 <http://schema.org/name> "Garden City, N.Y." . _:A52eb8ca1X3aX138a1313c61X3aXX2dX7536 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Place> . <http://www.worldcat.org/oclc/524483> <http://purl.org/library/placeOfPublication> _:A52eb8ca1X3aX138a1313c61X3aXX2dX7536 .

Web:
<http://www.worldcat.org/oclc/524483> a schema:Book;
    library:holdingsCount "803"@en;
    library:oclcnum "524483"@en;
    library:placeOfPublication [ a schema:Place;
            schema:name "Garden City, N.Y.,"@en ];

Another piece of information is that each instance of a place of publication string is given a new identity:

_:AX2dX44931a01X3aX138a139ed19X3aXX2dX600d <http://schema.org/name> "Garden City, N.Y." . _:AX2dX44a4d9f9X3aX138a132b9d9X3aXX2dX4efe <http://schema.org/name> "Garden City, N.Y." . _:AX2dX44a4d9f9X3aX138a1378e1dX3aXX2dX1d8d <http://schema.org/name> "Garden City, N.Y." . _:AX2dX45b46946X3aX138a139141cX3aXX2dX7073 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4a6da202X3aX138a1387049X3aXX2dX7bcc <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4b32d4b9X3aX138a1316a9bX3aXX2dX5f92 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4b5c4da3X3aX138a135d400X3aXX2dX515e <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4b93edacX3aX138a1314f3eX3aXX2dX58e9 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4c810b47X3aX138a134150cX3aXX2dX5b77 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4f8be47aX3aX138a12b4eb9X3aXX2dX1677 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX4f8be47aX3aX138a12b4eb9X3aXX2dX23e1 <http://schema.org/name> "Garden City, N.Y." . _:AX2dX52ad903aX3aX138a12d336bX3aXX2dX5389 <http://schema.org/name> "Garden City, N.Y." .


Where punctuation doesn't cloud the picture, these could eventually be linked to:
  http://id.loc.gov/authorities/names/n50068040.html
and:
  http://www.geonames.org/5118226/garden-city.html

and in that way could have a shared identity.

kc

p.s. Richard and I are on a list with someone who has loaded the triples into a database. I will ask if we can announce it here, and will also try to figure out how to use the SPARQL endpoint and provide some examples, if that is ok with the dc:creator of the database.

[1] javascript:location.href='http://www.w3.org/2012/pyRdfa/extract?format=turtle&uri='+escape(location.href)


In the meantime, can you let me have a few examples of where you are seeing
discrepancies between the download triples and the RDFa embedded in
WorldCat.org pages.

~Richard.

On 22 August 2012 19:08, Karen Coyle <[email protected]> wrote:

Richard, I've run into yet another area where documentation would be
helpful. There are differences between the schema.org/RDFa that is
embedded in WorldCat data and the exported WorldCat triples in the file.
One of those differences happens to be the source of the place of
publication, if I am reading it right. So, again, a request for
documentation on the fields included and their MARC source.

Thanks,

kc

On 8/17/12 8:38 AM, Richard Wallis wrote:

In case you missed the press release earlier this week.

You can now download a significant number of RDF triples describing the
most highly held 1.2 million resources in WorldCat.  Licensed under
ODC-BY.

I've posted more details on my blog:
http://dataliberate.com/2012/**08/get-yourself-a-linked-data-**
piece-of-worldcat-to-play-**with/<http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/>

~Richard.

--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet




--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Reply via email to