PS: Kyle, that's your own version? That's... sort of kind of machine
readable. Well, not really. I can't figure out quite what's going on
there, the label/value pairs are just stuffed in single, javascript
string literals, seperated by newlines, or sometimes (but sometimes not)
with "Assigned code:" strings, etc.
That's in facta little bit harder to parse then what I'm doing against
LC. I'm running CSS selectors against the HTML; I'm not having any
difficulty parsing, the problem is that the format can change without
notice. But yours seems harder to parse to me, am I missing something?
In the end, all I need is a list of pairs, code to label. I'll be
looking up from code, so I don't even care about "alternate labels",
really.
On 6/22/2011 5:57 PM, Kyle Banerjee wrote:
I went through a process similar to what you describe sometime back for a
tool I made (i.e. I could find no easily downloadable info). You can
download something that will be easier to parse from
http://calculate.alptown.com/gac.js
It's probably not 100% accurate as I haven't downloaded for quite awhile.
But catalogers have me correct errors they discover and there are about 800
unique visitors per day so I assume they notice most things.
It would be nice if this kind of data could be provided in a straightforward
format.
kyle
On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind<[email protected]> wrote:
Can anyone remind me if there's a machine readable copy of the MARC
geographic codes available at any persistent URL?
They're in HTML at
http://www.loc.gov/marc/**geoareas/gacs_code.html<http://www.loc.gov/marc/geoareas/gacs_code.html>.
I actually had a script that automatically downloaded from there and
"scraped" the HTML -- but sometime since I wrote the script, the HTML
structure on the page changed and it broke.
(I kind of thought that was unlikely since that HTML page itself was
machine generated -- but I guess they changed the software that generated
it. Certainly I knew that scraping HTML was a bad thing to rely on... which
is why I hope LC provides this in some format less likely to change?)