Re: [CODE4LIB] Describe sub-collections in DCAT - advice very much appreciated

2016-07-06 Thread Ethan Gruber
Sorry, to be a little more constructive:

If you can describe the difference between Europeana's functionality now
and your vision for your CKAN implementation, that would be helpful for
providing advice.

On Wed, Jul 6, 2016 at 10:36 AM, Ethan Gruber <ewg4x...@gmail.com> wrote:

> Are these GLAMs also putting cultural heritage data into Europeana? You
> can already filter by country (that holds the work) in Europeana.There are
> 6 million objects from the Netherlands. Your energy might be better spent
> either harvesting Dutch material back out of Europeana into a separate
> Netherland-only interface or by focusing on integrating smaller
> institutions into Europeana via OAI-PMH.
>
> In fact, your own material are in Europeana:
> http://www.europeana.eu/portal/search?f%5BCOUNTRY%5D%5B%5D=netherlands%5BTYPE%5D%5B%5D=SOUND=
>
> Ethan
>
> On Tue, Jul 5, 2016 at 12:19 PM, Johan Oomen <joo...@beeldengeluid.nl>
> wrote:
>
>> Good afternoon,
>>
>> In the Netherlands, we’re working on overhauling our current (OAI-PMH)
>> aggregation infrastructure towards a more distributed model. The aim is to
>> create a comprehensive collection of digitised cultural heritage objects
>> held by GLAMs across the country. A major component of the new
>> infrastructure is a register with collections. We are using CKAN as the
>> data management system for these collections.
>>
>> We are currently installing and configuring CKAN, and use DCAT for
>> describing datasets. We are interested in seeing other examples of
>> registries that describes digital heritage collections using the CKAN
>> software. One of the challenges we encounter is describing multi level
>> datasets like collection and sub-collections in the context of DCAT. An
>> example is a data provider in the Netherlands that provides an aggregated
>> oral history dataset for target audience ‘oral history’. We registered this
>> aggregated dataset, but we also want to register individual collections for
>> participating organisations. Therefore, the aggregated dataset is divided
>> into parts using xpath, xslt, etc.. Now we want to explicitly mark the
>> dataset parts as being a sub-dataset and vice versa.
>>
>> Question to this community, do you have implementations that use a CKAN
>> based registry for digital heritage collections, have you also dealt with
>> this issue to describe sub-collections in DCAT? How did you manage this?
>>
>> Your help is much appreciated,
>>
>> Best wishes,
>>
>> Johan Oomen
>> Netherlands Institute for Sound and Vision
>> @johanoomen
>
>
>


Re: [CODE4LIB] Describe sub-collections in DCAT - advice very much appreciated

2016-07-06 Thread Ethan Gruber
Are these GLAMs also putting cultural heritage data into Europeana? You can
already filter by country (that holds the work) in Europeana.There are 6
million objects from the Netherlands. Your energy might be better spent
either harvesting Dutch material back out of Europeana into a separate
Netherland-only interface or by focusing on integrating smaller
institutions into Europeana via OAI-PMH.

In fact, your own material are in Europeana:
http://www.europeana.eu/portal/search?f%5BCOUNTRY%5D%5B%5D=netherlands%5BTYPE%5D%5B%5D=SOUND=

Ethan

On Tue, Jul 5, 2016 at 12:19 PM, Johan Oomen 
wrote:

> Good afternoon,
>
> In the Netherlands, we’re working on overhauling our current (OAI-PMH)
> aggregation infrastructure towards a more distributed model. The aim is to
> create a comprehensive collection of digitised cultural heritage objects
> held by GLAMs across the country. A major component of the new
> infrastructure is a register with collections. We are using CKAN as the
> data management system for these collections.
>
> We are currently installing and configuring CKAN, and use DCAT for
> describing datasets. We are interested in seeing other examples of
> registries that describes digital heritage collections using the CKAN
> software. One of the challenges we encounter is describing multi level
> datasets like collection and sub-collections in the context of DCAT. An
> example is a data provider in the Netherlands that provides an aggregated
> oral history dataset for target audience ‘oral history’. We registered this
> aggregated dataset, but we also want to register individual collections for
> participating organisations. Therefore, the aggregated dataset is divided
> into parts using xpath, xslt, etc.. Now we want to explicitly mark the
> dataset parts as being a sub-dataset and vice versa.
>
> Question to this community, do you have implementations that use a CKAN
> based registry for digital heritage collections, have you also dealt with
> this issue to describe sub-collections in DCAT? How did you manage this?
>
> Your help is much appreciated,
>
> Best wishes,
>
> Johan Oomen
> Netherlands Institute for Sound and Vision
> @johanoomen


Re: [CODE4LIB] Anything Interesting Going on in Archival Metadata?

2016-05-24 Thread Ethan Gruber
There's a fair amount of innovation taking place with respect to linked
data in archives, but I don't think it's as well advertised as what's been
taking place in libraries in North America. The highest profile project in
the archival realm is Social Networks and Archival Context (
http://socialarchive.iath.virginia.edu/), which is focused mainly on
archival authorities, but there's a tremendous potential in being able to
aggregate archival content related to these authorities. Authorities and
archival content can and are being modelled into linked open data, but
there's no real standard for how to do this in the field. A group is
working on a conceptual reference model for archival collections, but the
modelling of people and their relationships is bold new territory. I've
done some work on this myself using a variety of existing ontologies and
software platforms to connect pieces from our archives, digital library,
authorities, and museum objects together into a cohesive framework (you can
read more at http://eaditor.blogspot.com/ and
http://numishare.blogspot.com/2016/03/updating-mantis-and-igch-incorporating.html
).

It is also possible to use CIDOC-CRM for the modelling of people and their
relationships and events (same for using the CRM to model archival
collections). CIDOC-CRM is rarely, if ever discussed in code4lib despite
its 'importance' in the cultural heritage sector (predominately in Europe).
I've had difficulty getting discussions for the modeling of authorities
into RDF off the ground with some grant applications that have fallen short.

Ethan

On Tue, May 24, 2016 at 9:57 AM, Matt Sherman 
wrote:

> Hi all,
>
> I was recently talking with some folks about some archives related
> things and realized that while I've heard a lot recently about
> different projects, advancements, and issues within library specific
> metadata, and its associated concerns, I have not heard as much
> recently about metadata in the archives realm.  Is there much going on
> there?  Is linked data even useful in a setting with extremely unique
> materials?  Is this a stupid question?  I don't know, but I am curious
> to hear if there are any interesting things people are doing in
> archival metadata or any challenges folks are working to overcome.
>
> Matt Sherman
>


Re: [CODE4LIB] question on harvesting RDF

2016-05-09 Thread Ethan Gruber
I don't recommend using different properties that have the same basic
semantic meaning for those different contexts (dc:subject vs.
dcterms:subject). In a linked data environment, I don't recommend using
Dublin Core Elements at all, but only dcterms. It is possible to harvest
subject terms regardless of whether it is a literal or a URI, but the
harvester might have to take some additional action to generate a human
readable result from an LCSH URI.

1. The harvester goes out and fetches the machine readable data for
http://id.loc.gov/authorities/subjects/sh85002782 to get the label
2. You import the RDF for LCSH into your system so that an OPTIONAL line
can be inserted into SPARQL (assuming you are using SPARQL) to get the
skos:prefLabel for the URI directly from your own system.

I'd suggest discussing these options with developers that may potentially
harvest your data, or at least provide a means to developers to give you
feedback so that you can deliver a web service that makes harvesting as
efficient as possible.

I hope this is useful. I think there are many possible solutions. But, in
sum, don't use dc:subject and dcterms:subject simultaneously.

Ethan

On Mon, May 9, 2016 at 1:58 PM, English, Eben  wrote:

> Hello all,
>
> A little context: the MODS and RDF Descriptive Metadata Subgroup
> (
> https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup
> )
> is a group of cultural institutions working together to model MODS XML
> as RDF.
>
> Our project diverges from previous efforts in this domain in that we're
> trying to come up with a model that takes more advantage of widely-used
> vocabularies and namespaces, avoiding blank nodes at all costs.
>
> As we work through the list of MODS elements, we've been stumbling on a
> few thorny issues, and with our goal of making our data as shareable as
> possible, we agreed that it would be helpful to try and get the input of
> folks who have more experience in harvesting and parsing RDF from the
> proliferation of data providers existing in the real world (see
> https://datahub.io/dataset for a great list).
>
> Specifically, when consuming RDF from a new data source, how big of a
> problem are the following issues:
>
>
> #1. Triples where the object may be a string literal or a URI
>
> For example, the predicate 'dc:subject' from the Dublin Core Elements
> vocabulary has no defined range, which means it can be used with both
> literal and non-literal values
> (
> http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dc:subject
> ).
>
> So one could have both in a data store:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dc:subject
>  .
>
>
> ... versus ...
>
>
> #2. Using multiple predicates with similar/overlapping definitions,
> depending on the value of the object
>
> For example, when expressing the subject of a work, using different
> predicates depending on whether there is an existing URI for a topic or
> not:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dcterms:subject
>  .
>
>
> We're wondering which approach is less problematic from a Linked
> Data-harvesting standpoint. Issue #1 requires that the parser be
> prepared to handle different types of values from the same predicate,
> but issue #2 involves parsing an additional namespace and predicate, etc.
>
> Any thoughts, suggestions, or comments would be greatly appreciated.
>
> Thanks,
> Eben
>
> --
> Eben English | Boston Public Library
> Web Services Developer
> 617-859-2238 |eengl...@bpl.org
>


Re: [CODE4LIB] Good Database Software for a Digital Project?

2016-04-15 Thread Ethan Gruber
There are countless ways to approach the problem, but I suggest beginning
with tools that are within the area of expertise of your staff. Mapping
disparate structured formats into a single Solr instance for fast search
and retrieval is one possibility.

On Fri, Apr 15, 2016 at 2:18 PM, Matt Sherman 
wrote:

> Hi all,
>
> I am looking to pick the group brain as to what might be the most useful
> database software for a digital project I am collaborating on.  We are
> working on converting an annotated bibliography to a searchable database.
> While I have the data in a few structured formats, we need to figure out
> now what to actually put it in so that it can be queried.  My default line
> of thinking is to try a MySQL since it is free and used ubiquitously
> online, but I wanted to see if there were any other database or software
> systems that we should also consider before investing a lot of time in one
> approach.  Any advice and suggestions would be appreciated.
>
> Matt Sherman
>


Re: [CODE4LIB] Structured Data Markup on library web sites

2016-03-23 Thread Ethan Gruber
We embed schema.org properties in RDFa within metadata for ETDs in our
Digital Library application, e.g.,
http://numismatics.org/digitallibrary/ark:/53695/money_and_power_in_the_viking_kingdom_of_york

I don't know exactly how Google's algorithms establish "authority," but the
ETDs in our system usually show up in the first few results in
Google--usually above academia.edu. Part of the reason is probably our use
of schema.org, but part of the reason is also because of the authority
Google's algorithms have put into content on numismatics.org.

We use RDFa throughout our digital applications, though not schema.org, but
with classes and properties more relevant to archives or coins. I think
that once the archival extension to schema.org is more formalized (Richard
Wallis is the driving force behind that discussion), we'll probably
implement that in our archives with EADitor (
https://github.com/ewg118/eaditor).

On Wed, Mar 23, 2016 at 9:05 AM, Jason Ronallo  wrote:

> Charlie,
>
> Since you've been here we've also added schema.org data for events:
> http://www.lib.ncsu.edu/event/red-white-black-walking-tour-4
>
> And for a long time we've used this for our special collections:
> http://d.lib.ncsu.edu/collections/catalog/mc00240-001-ff0093-001-001_0010
> And for videos on a few sites:
>
> http://d.lib.ncsu.edu/computer-simulation/videos/donald-e-knuth-interviewed-by-richard-e-nance-knuth
>
> Looking at it again now it could use some cleanup to trigger better
> rich snippets, but in the past it had been improving what our search
> results looked like.
>
> Jason
>
> On Wed, Mar 23, 2016 at 7:48 AM, Charlie Morris 
> wrote:
> > I can remember putting schema.org markup around the location information
> > for lib.ncsu.edu, and it's still there, checkout the footer. One small
> > example anyway. I'm not sure that it's actually had any effects though -
> I
> > don't see it in search engine results though and it's been there for
> > probably 2+ years now.
> >
> > On Tue, Mar 22, 2016 at 8:44 PM, Jennifer DeJonghe <
> > jennifer.dejon...@metrostate.edu> wrote:
> >
> >> Hello,
> >>
> >> I'm looking for examples of library web sites or university web sites
> that
> >> are using Structured Data / schema.org to mark up books, locations,
> >> events, etc, on their public web sites or blogs. I'm NOT really looking
> for
> >> huge linked data projects where large record sets are marked up, but
> more
> >> simple SEO practices for displaying rich snippets in search engine
> results.
> >>
> >> If you have examples of library or university websites doing this,
> please
> >> send me a link!
> >>
> >> Thank you,
> >> Jennifer
> >>
> >> Jennifer DeJonghe
> >> Librarian and Professor
> >> Library and Information Services
> >> Metropolitan State University
> >> St. Paul, MN
> >>
>


Re: [CODE4LIB] Listserv communication

2016-02-26 Thread Ethan Gruber
Nearly all of my professional communication occurs on Twitter, for better
or worse. I think that is probably the case for many of us. Code4lib is
very much alive, but perhaps has evolved into disparate conversations
taking place on Twitter instead of the listserv.

On Fri, Feb 26, 2016 at 10:07 AM, Shaun D. Ellis 
wrote:

>
> On Feb 26, 2016, at 8:42 AM, Julie Swierczek  > wrote:
>
> We also agreed that listservs – both here and elsewhere – seem to have
> shrinking participation over time, and there does seem to be a drive to
> pull more conversations out of the public eye.  There is no question that
> some matters are best discussed in private channels, such as feedback about
> individual candidates for duty officers, or matters pertaining to physical
> and mental well-being.  But when it comes to discussing technology or other
> professional matters, there seems to be a larger trend of more responses
> going off listservs.  (I, for one, generally do not reply to questions on
> listservs and instead reply to the OP privately because I’ve been burned to
> many times publicly.  The main listserv for archivists in the US has such a
> bad reputation for flaming that it has its own hashtag: #thatdarnlist.)
>
> Maybe we can brainstorm about common reasons for people not using the
> list: impostor syndrome (I don’t belong here and/or I certainly don’t have
> the right ‘authority’ to respond to this); fear of being judged - we see
> others being judged on a list (about the technological finesse of their
> response, for instance) so we don’t want to put ourselves in a position
> where we will be judged; fear of talking in general because we  have seen
> other people harmed for bringing their ideas to public forums (cf. doxing
> and swatting);  fear of looking stupid in general.
>
> Thank you for bringing this up, Julie.  I have been curious about this
> myself. I think you are correct in that there is some “impostor syndrome
> involved, but my hypothesis is that there has been a lot of splintering of
> the channels/lists over the past several years that has dried up some of
> the conversation.  For one, there’s StackOverflow.  StackOverflow is more
> effective than a listserv on general tech questions because it requires you
> to ask questions in a way that is clear (with simple examples) and keeps
> answers on topic.  There has also been a move towards specific project
> lists so that more general lists like Code4Lib are not bombarded with
> discussions about project-related minutia that are only relevant to a
> certain sub-community.
>
> I don’t see this as a bad thing, as it allows Code4Lib to be a gathering
> hub among many different sub-groups.  But it can make it difficult to know
> what is appropriate to post and ask here. Code4Lib has always been about
> inspiration and curiosity to me. This is a place to be a free thinker, to
> question, to dissent, to wonder.  We have a long tradition of “asking
> anything” and we shouldn’t discourage that, but I think Code4Lib is a
> particularly good space to discuss bigger-picture tech-in-library
> issues/challenges as well as general best practices at a “techy” level.
> It’s certainly the appropriate space to inspire others with amazing
> examples of library tech that delights users. :)
>
> I have to admit that I was disappointed that the recent question about
> full-text searching basics (behind OregonDigital’s in-page highlighting of
> keywords in the IA Bookreader) went basically unanswered.  This was a
> well-articulated legitimate question, and at least a few people on this
> list should be able to answer it. It’s actually on my list to try to do it
> so that I can report back, but maybe someone could save me the trouble and
> quench our curiosity?
>
> Cheers,
> Shaun
>
>
>
>
>
>


Re: [CODE4LIB] TEI->EPUB serialization testing

2016-01-14 Thread Ethan Gruber
Thanks, Eric. Is the original code online anywhere? I will eventually write
some XSL:FO to generate PDFs for people who want those, for some reason.

On Thu, Jan 14, 2016 at 10:05 AM, Eric Lease Morgan <emor...@nd.edu> wrote:

> On Jan 13, 2016, at 4:17 PM, Ethan Gruber <ewg4x...@gmail.com> wrote:
>
> > Part of this grant stipulates that open access books be made available
> in EPUB 3.0.1, so I got to work on a pipeline for dynamically serializing
> TEI into EPUB. It works pretty well, but there are some minor issues. The
> issues might be related more to differences between individual ereader apps
> in supporting the 3.0.1 spec than anything I might have done wrong in the
> serialization process (the file validates according to a script I've been
> running)…
> >
> > If you are interested in more information about the framework, there's
> http://eaditor.blogspot.com/2015/12/the-ans-digital-library-look-under-hood.html
> and
> http://eaditor.blogspot.com/2016/01/first-ebook-published-to-ans-digital.html.
> It's highly LOD aware and is capable of posting to a SPARQL endpoint so
> that information can be accessed from other archival frameworks and
> integrated into projects like Pelagios.
>
>
> I wrote a similar thing a number of years ago, and it was implemented as
> Alex Lite. [1] I started out with TEI files, and then transformed them into
> a number of derivatives: simple HTML, “cooler” HTML, PDF, and ePub. I think
> my ePub version was somewhere around 2.0. The “framework” was written in
> Perl, of course.  ;-)  The whole of a Alex Lite was designed to be given
> away on CD or as an instant website. (“Just add water."). The hard part of
> the whole thing was the creation of the TEI files in the first place. After
> that, everything was relatively easy.
>
> [1] Alex Lite blog posting - http://bit.ly/eazpJY
> [2] Alex Lite - http://infomotions.com/sandbox/alex-lite/
>
> —
> Eric Lease Morgan
> Artist- And Librarian-At-Large
>
> (A man in a trench coat approaches, and says, “Psst. Hey buddy, wanna buy
> a registration to the Code4Lib conference!?”)
>


[CODE4LIB] TEI->EPUB serialization testing

2016-01-13 Thread Ethan Gruber
Hi all,

I've been working on and off for a few months on a system for publishing
ebooks, ETDs, and other digital library materials online to a more
consolidated "Digital Library" application (
http://numismatics.org/digitallibrary). The framework (
https://github.com/AmericanNumismaticSociety/etdpub) was initially designed
for quick and easy PDF indexing and publication of ETDs, but has evolved to
a TEI publication framework for the NEH-Mellon Humanities Open Book Program
grant we received recently.

Part of this grant stipulates that open access books be made available in
EPUB 3.0.1, so I got to work on a pipeline for dynamically serializing TEI
into EPUB. It works pretty well, but there are some minor issues. The
issues might be related more to differences between individual ereader apps
in supporting the 3.0.1 spec than anything I might have done wrong in the
serialization process (the file validates according to a script I've been
running).

We published our first open access ebook today:
http://numismatics.org/digitallibrary/id/Miller-ANS-Medals. There's a link
on the right to the EPUB file. I would greatly appreciate any feedback you
can provide. I created a survey that will help in usability testing:
https://docs.google.com/forms/d/10Prvpm5eDvjNZaeqgXZ7luLeSkVrOgZ3hJX5zjFBuSg/viewform
.

There is a dearth of decent information about EPUB usability testing on the
web.

If you are interested in more information about the framework, there's
http://eaditor.blogspot.com/2015/12/the-ans-digital-library-look-under-hood.html
and
http://eaditor.blogspot.com/2016/01/first-ebook-published-to-ans-digital.html.
It's highly LOD aware and is capable of posting to a SPARQL endpoint so
that information can be accessed from other archival frameworks and
integrated into projects like Pelagios.

Ethan


[CODE4LIB] Fwd: [LODLAM] seeking LODLAM Workshop Leaders

2015-08-31 Thread Ethan Gruber
-- Forwarded message --
From: <jon.v...@shiftdesign.org.uk>
Date: Mon, Aug 31, 2015 at 4:20 PM
Subject: [LODLAM] seeking LODLAM Workshop Leaders
To: lod-...@googlegroups.com


Hey folks,
We've got some limited funding available to help support a number of
workshops over the next 18 months and we're looking for volunteers willing
to lead 1.5 hour hands-on workshops. Please take a minute to fill out the
below form if you're interested.

Thanks!
LODLAM workshop coordinators: Jon, Ethan, Anne

If you have trouble viewing or submitting this form, you can fill it out in
Google Forms
<https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/viewform?c=0=1=mail_form_link>.


LODLAM Workshop Leaders Interest Form
We’re seeking volunteers to run workshops at a number of key conferences
throughout 2015-16 designed to teach the basics of Linked Open Data in
Libraries, Archives and Museums LODLAM. In doing so, we hope to strengthen
a growing community of practitioners willing and able to build upon shared
cultural heritage and scientific data. The expectation is that we can
assemble and share the workshop plans and content to enable more and more
people to host and teach them over time.

In particular, we’re looking for people to teach tools in 1.5 hr sessions
with real data that address the following 4 categories:
--Big picture view: Introduce the basic concepts of LODLAM integrating
examples of what people are already doing with it.
--Cleaning: Use Open Refine to clean and reconcile datasets to make them
more usable for the public.
--Publishing: Demonstrate ways that people can publish datasets in the
library/archive/museum space - from publishing CSV’s and posting datasets
in Github to rdf’izing in Open Refine or using triplestores.
--Reusing and Building: Teach SPARQL as well as open source tools used to
visualize single or multiple collections.

Let us know if you can help!

Please contact Jon Voss (jon.v...@shiftdesign.org.uk), Ethan Gruber (
ewg4x...@gmail.com), or Anne Gaynor (amgayn...@gmail.com) if you have any
questions.

* Required

   Personal Information
   First Name *
   Last Name *
   Email address *
   Country *
   Affiliation
   Twitter handle
   Phone number
   What would you like to teach?
   Which sections are you interesting in teaching? *
   - Big picture view: Introduce the basic concepts of LODLAM integrating
  examples of what people are already doing with it.
  - Cleaning: Use Open Refine to clean and reconcile datasets to make
  them more usable for the public.
  - Publishing: Demonstrate ways that people can publish datasets in
  the library/archive/museum space - from publishing
  - Reusing and Building: Teach SPARQL as well as open source tools
  used to visualize single or multiple collections.
   Tell us more about what you'd like to teach *
   What specific concepts, tools, languages, etc. can you teach? Is the
   tool free? Is it open source?
   Where could you teach?
   We're colocating these sessions with a number of conferences throughout
   2015-16. At which conference(s) would you be able to teach a session? NOTE:
   There will be very limited travel stipends, conference discounts, or
   honorariums available, so please keep that in mind as you select. You may
   want to select conferences nearby or one that you are already planning to
   attend.
   Select one or more places you would be able to teach: *
   - Digital Library Federation, Vancouver, BC: October 26-28, 2015
  - Archaeological Institute of America/Society of Classical Studies,
  San Francisco, CA: January 6-9, 2016
  - code4lib, Philadelphia, PA: March 7-10, 2016
  - Electronic Resources & Libraries, Austin, TX: April 3-6, 2016
  - Museums and the Web 2016, Los Angeles, CA: April 6-9, 2016
  - DPLA Fest, Location TBD: mid-April 2016
  - Society of American Archivists, Atlanta, GA: July 31-August 6 2016
  - Smart Data, San Jose, CA: August 2016
  - Dublin Core Metadata Initiative / Society for Information Science
  and Technology, Copenhagen, Denmark: October 13-16, 2016
   Other participation
   Would you be willing to join a list a speakers available to institutions
   looking to bring specialists to run workshops? *
   - Yes!
  - No thanks
  - Let me think about it...
   Anything else?
   Any other thoughts? Ideas? Questions?
   Never submit passwords through Google Forms.

Powered by
[image: Google Forms]
<https://www.google.com/forms/about/?utm_source=product_medium=forms_logo_campaign=forms>
This form was created outside of your domain.
Report Abuse
<https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/reportabuse?source=https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/viewform?sid%3D4821e0f552fca85f%26c%3D0%26w%3D1%26token%3DRVOShU8BAAA.WEqvdH_C1lQk0o5pfmjV9Q.Y-yfjSmMDzwLcHQI1WHWKg>
- Terms of Service <htt

Re: [CODE4LIB] XSLT Advice

2015-06-02 Thread Ethan Gruber
You really just need to wrap the label in the xsl:text and the xsl:value of
in an xsl:if that tests whether the value-of XPath returns a string.

dc:identifierxsl:value-of
select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='name']/doc:element/doc:field[@name='value']/

xsl:if
test=string(doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='volume']/doc:element/doc:field[@name='value'])
xsl:text Vol. /xsl:textxsl:value-of
select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='volume']/doc:element/doc:field[@name='value']/
/xsl:if

xsl:if
test=string(doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='issue']/doc:element/doc:field[@name='value'])
xsl:text Issue /xsl:textxsl:value-of
select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='issue']/doc:element/doc:field[@name='value']/
/xsl:if
/dc:identifier

If there's no name at all, you'd want to wrap an xsl:if around the
dc:identifier so that you suppress an empty dc:identifier element.

On Tue, Jun 2, 2015 at 3:34 PM, Matt Sherman matt.r.sher...@gmail.com
wrote:

 Cool.  I talked to Ron via phone so I am getting a better picture, but
 I am still happy to take more insights.

 So the larger context.  I inherited a DSpace instance with three
 custom metadata fields which actually have some useful publication
 information, though they improperly titled them in by associating them
 with a dc prefix but there were two many to fix quickly and they
 haven't broken DSpace yet so we continue.  So I added to the XSL to
 pull the data within the the custom fields to display publication
 name Vol. publication volume Issue publication issue.  That
 worked really well until I realized that there was no conditional so
 even when the fields are empty I still get: dc:identifierVol.
 Issue/dc:identifier

 So here are the Custom Metadata fields:

 dc.publication.issue
 dc.publication.name
 dc.publication.volume


 Here is the customized XSLT, with dc.identifier added for context of
 what the rest of the sheet looks like.

 !-- dc.identifier --
 xsl:for-each

 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='identifier']/doc:element/doc:field[@name='value']
 dc:identifierxsl:value-of select=. //dc:identifier
 /xsl:for-each

 !-- dc.identifier.* --
 xsl:for-each
 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='identifier']/doc:element/doc:element/doc:field[@name='value']
 dc:identifierxsl:value-of select=. //dc:identifier
 /xsl:for-each

 !-- dc.publication fields to dc.identifier --
 dc:identifierxsl:value-of

 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='name']/doc:element/doc:field[@name='value']/xsl:text
 Vol. /xsl:textxsl:value-of

 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='volume']/doc:element/doc:field[@name='value']/xsl:text
 Issue /xsl:textxsl:value-of

 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='issue']/doc:element/doc:field[@name='value']//dc:identifier


 Ron suggested that using choose and when and that does seem to make
 the most sense.  The other trickiness is that I have found that some
 of these fields as filled when others are blank, such as their being a
 volume but not an issue.  So I need to figure out how to test multiple
 fields so that I can have it display differently dependent on what has
 data or not at all none of the fields are filled, which is the case in
 items such as posters.

 So any thoughts would help.  Thanks.

 On Tue, Jun 2, 2015 at 2:50 PM, Wick, Ryan ryan.w...@oregonstate.edu
 wrote:
  I agree with Stuart, post the example here.
 
  Or if you want more real-time chat there's always #code4lib IRC.
 
  For an XSLT resource, Dave Pawson's site is great:
 http://www.dpawson.co.uk/xsl/sect2/sect21.html
 
  Ryan Wick
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Stuart A. Yeates
  Sent: Tuesday, June 02, 2015 11:46 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] XSLT Advice
 
  There are a number of experienced xslt'ers here. Post your example to
 the group so we can all learn.
 
  Cheers
  Stuart
 
  On Wednesday, June 3, 2015, Matt Sherman matt.r.sher...@gmail.com
 wrote:
 
  Hi all,
 
  I am making a few corrections on an oai_dc.xslt file for our DSpace
  instance I slightly botched modifying to integrate some custom
  metadata into a dc.identifier citation in the OAI-PMH harvest.  I need
  to get proper conditionals so it can display and harvest the metadata
  correctly and not run when there is no data in those fields.  I have a
  pretty good idea what I need to do, and if this were like JavaScript
  or Python I could probably muddle 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Ethan Gruber
+1 on the RDFa and schema.org. For those that don't know the library URL
off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

 Note that library hours is one of the possible bits of information that
 could be encoded as RDFa in the library web site, thus making it possible
 to derive library hours directly from the listing of hours on the web site
 rather than keeping a separate list. Schema.org does have the elements such
 that hours can be encoded. This would mean that hours could show in the
 display of the library's catalog entry on Google, Yahoo and Bing. Being
 available directly through the search engines might be sufficient, not
 necessitating creating yet-another-database for that data.

 Schema.org uses a restaurant as its opening hours example, but much of the
 data would be the same for a library:

 div vocab=http://schema.org/; typeof=Restaurant
   span property=nameGreatFood/span
   div property=aggregateRating  typeof=AggregateRating
 span property=ratingValue4/span stars -
 based on span property=reviewCount250/span reviews
   /div
   div property=address  typeof=PostalAddress
 span property=streetAddress1901 Lemur Ave/span
 span property=addressLocalitySunnyvale/span,
 span property=addressRegionCA/span span
 property=postalCode94086/span
   /div
   span property=telephone(408) 714-1489/span
   a property=url href=http://www.dishdash.com;www.greatfood.com/a
   Hours:
   meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am -
 2:30pm
   meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm -
 9:30pm
   meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm -
 10:00pm
   Categories:
   span property=servesCuisine
 Middle Eastern
   /span,
   span property=servesCuisine
 Mediterranean
   /span
   Price Range: span property=priceRange$$/span
   Takes Reservations: Yes
 /div

 It seems to me that using schema.org would get more bang for the buck --
 it would get into the search engines and could also be aggregated into
 whatever database is needed. As we've seen with OCLC, having a separate
 listing is likely to mean that the data will be out of date.

 kc

 On 5/5/15 2:19 PM, nitin arora wrote:

 I can't see they distinguished between public libraries and other types on
 their campaign page.

 They say  all libraries as far as I can see.
 So I suppose then that this is true for all libraries:
 Libraries offer a space anyone can enter, where money isn't exchanged,
 and
 documentation doesn't have to be shown.
 Who knew fines and library/student-IDs were a thing of the past?

 The only data sets I can find where they got the 17,000 number is for
 public libraries:
 http://www.imls.gov/research/pls_data_files.aspx
 Maybe I missed something.
 There is an hours field on one of the CSVs I downloaded, etc for 2012 data
 (the most recent I could find).

 Asking 10k for something targeted for completion in June and without a
 grasp on what types of libraries there are and how volatile the hours
 information is (especially in crisis) ...
 Sounds naive at best, sketchy at worst.

 The flexible funding button says this campaign will receive all funds
 raised even if it does not reach its goals.

 The value of these places for youth cannot be underestimated.
 So is the value of a quick buck ...

 On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran 
 tmcca...@georgialibraries.org wrote:

  I'm not at all surprised that this doesn't already exist, and even if
 OCLC's was available, I'd be willing to bet it was out of date.

 Public library hours, especially in underfunded areas, may fluctuate
 depending on funding cycles, seasons (whether school is in or out), etc.,
 not to mention closing/reopening/moving because of old buildings that
 need
 to be updated. We have around 280 locations in our consortium and we have
 to rely on self-reporting to find out if their hours change. We certainly
 don't have staff time to check every one of their web sites on regular
 basis, I can't imagine keeping track of 17,000!


 Terran McCanna
 PINES Program Manager
 Georgia Public Library Service
 1800 Century Place, Suite 150
 Atlanta, GA 30345
 404-235-7138
 tmcca...@georgialibraries.org


 - Original Message -
 From: Peter Murray jes...@dltj.org
 To: CODE4LIB@LISTSERV.ND.EDU
 Sent: Tuesday, May 5, 2015 4:36:56 PM
 Subject: Re: [CODE4LIB] Library Hours

 OCLC has an institutional registry [1], which had (in part) library
 hours,
 addresses, and so forth.  It seems to be unavailable, though [2].  That
 is
 the only systematic collection of library hours data that I know about.


 Peter

 [1] https://www.oclc.org/worldcat-registry.en.html
 [2] https://www.worldcat.org/registry/institution/

  On May 5, 2015, at 4:16 PM, Bigwood, 

Re: [CODE4LIB] Restrict solr index results based on client IP

2015-01-07 Thread Ethan Gruber
There are a few ways to do this, and yes, some version of #2 is desirable.
I think it may depend on how specific these IP addresses are. Do you
anticipate that one IP range may have access to X documents and a different
IP range may have access to Y documents, or will all IP ranges have access
to the same restricted documents (i.e., anyone on campus can access
everything). The former scenario requires IPs to stored in the Solr docs
and the second only requires a boolean field type, e.g. restricted =
yes/no. In fact, in the former scenario, you'd probably want to associate
the IP range with of key of some sort, e.g.

In the schema, have field name=group

In your doc have the group field contain the value medical_school. Then
somewhere in your application (not stored and indexed in Solr), you can say
that medical_school carries the ranges 192.168,1.*, 192.168.2.*, etc.
That way, if the medical school picks up a new IP range or the range
changes, you can make a minor update to your application without having to
reindex content in Solr.

Ethan

On Wed, Jan 7, 2015 at 11:41 AM, Chad Mills cmmi...@rci.rutgers.edu wrote:

 Hello,

 Basically I have a solr index where, at times, some of the results from a
 query will only be limited to a set of users based on their clients IP
 address.  I have been thinking about accomplishing this in either two ways.

 1) Post-processing the results for IP validity against an external data
 source and dropping out those results which are not valid.  That could
 leave me with a portioned result list that would need another query to fill
 back in.  Say I want 10 results, I end up dropping 2 of them, I need to
 fill back in those 2 by performing another query.

 2) Making the IP permission check part of the query.  Basically appending
 an AND in the query on a field that stores the permissible IP addresses.
 The index field would be set to allow all IPs to access the result by
 default, but at times can contain the allowable IP addresses or maybe even
 ranges somehow.

 Are there some other ways to accomplish this I haven't considered?  Right
 now #2 sounds seems more desirable to me.

 Thanks in advance for your thoughts!

 --
 Chad Mills
 Digital Library Architect
 Ph: 848.932.5924
 Fax: 848.932.1386
 Cell: 732.309.8538

 Rutgers University Libraries
 Scholarly Communication Center
 Room 409D, Alexander Library
 169 College Avenue, New Brunswick, NJ 08901

 https://rucore.libraries.rutgers.edu/



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Ethan Gruber
I recently extended Fuseki to hook into a Solr index for geographic query
for one of our linked data projects, and I'm happy with the results so far.
It will open the door for us to build more sophisticated geographic
visualizations. I have not extended Fuseki for Lucene/Solr based full text
search, as we have a standalone Solr index for that, and a separate search
interface (for general users) from the SPARQL query interface (for advanced
ones).

It's definitely true that there are scaling limitations in SPARQL--just
look at how often dbpedia and the British Museum SPARQL endpoint go down.
Hardware is overcoming these limitations, but I still advocate a hybrid
approach: using Solr where it is advantageous to do so, and then build
focused user interfaces on top of SPARQL, leveraging the advantages of a
triplestore in contexts other than search. We open up our SPARQL endpoint
to the public, but by far more users interact with SPARQL through a HTML
interfaces in several different projects without having any idea that they
are doing so. We only have about a million triples in our triplestore (but
this is going to grow enormously in less than two years, I think, as the
floodgates are about to open in the world of ancient Greco-Roman coins),
but the system has only gone down for about 2 minutes in the last 2.5
years, on a virtual machine with only 4GB of memory.

Ethan

On Fri, Dec 19, 2014 at 10:20 AM, Mixter,Jeff mixt...@oclc.org wrote:

 A triplestore is basically a database backend for RDF triples. The major
 benefit is that it allows for SPARQL querying. You could imagine a
 triplestore as being the same thing as a relational database that can be
 queried with SQL.

 The drawback that I have run into is that unless you have unlimited
 hardware, triplestores can run into scaling problems (when you are looking
 at hundreds of millions or billions of triples). This is a problem when you
 want to search for data. For searching I use a hybrid Elasticsearch (i.e.
 Lucene) index for the string literals and the go out to the triplestore to
 query for the data.

 If you are looking to use a triplestore it is important to distinguish
 between search and query.

 Triplestore are really good for query but not so good for search. The
 basic problem with search is that is it mostly string based and this
 requires a regular expression query in SPARQL which is expensive from a
 hardware perspective.

 There are a few triple stores that use a hybrid model. In particular Jena
 Fuseki (http://jena.apache.org/documentation/query/text-query.html)

 Thanks,

 Jeff Mixter
 Research Support Specialist
 OCLC Research
 614-761-5159
 mixt...@oclc.org

 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest,
 Stuart sforr...@bcgov.net
 Sent: Friday, December 19, 2014 10:00 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Hi All

 My question is what do you guys use triplestores for?

 Thanks
 Stuart



 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Stefano Bargioni
 Sent: Monday, November 11, 2013 8:53 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 My +1 for Joseki.
 sb

 On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

  What is your favorite RDF triplestore?
 
  I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.
 
  What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] Functional Archival Resource Keys

2014-12-09 Thread Ethan Gruber
I'm using a few applications in Tomcat, so inflections are much more
difficult to implement than content negotiation. I can probably tweak the
Apache settings to do a proxypass for inflections by modifying the examples
above.

I agree with Conal, though. Inflections are puzzling at best and bad
architecture at worst, and the sooner the community puts forward a more
standard solution, the better.

On Mon, Dec 8, 2014 at 7:21 PM, John Kunze j...@ucop.edu wrote:

 Just as a URL permits an ordinary user with a web browser to get to an
 object, inflections permit an ordinary user to see metadata (without curl
 or code).

 There's nothing to prevent a server from supporting both the HTTP Accept
 header (content negotiation) and inflections.  If you can do the one, the
 other should be pretty easy.

 On Mon, Dec 8, 2014 at 4:01 PM, Conal Tuohy conal.tu...@gmail.com wrote:

  I am really puzzled by the use of these non-standard inflexions as a
  means of qualifying an HTTP request. Why not use the HTTP Accept header,
  like everyone else?
 
 
  On 9 December 2014 at 07:59, John A. Kunze j...@ucop.edu wrote:
 
   Any Apache server (not Tomcat) can handle the '?' and '??' cases with a
   few rewrite rules to transform them into typical CGI-like query
 strings.
  
 # Detect ? and ?? inflections and map to typical CGI-style
 parameters.
 # One question mark case:  ?  - ?show=briefas=anvl/erc
 RewriteCond %{THE_REQUEST}  \?
 RewriteCond %{QUERY_STRING} ^$
 RewriteRule ^(.*)$ $1?show=briefas=anvl/erc
  
 # Two question mark case:  ?? - ?show=supportas=anvl/erc
 RewriteCond %{QUERY_STRING} ^\?$
 RewriteRule ^(.*)$ $1?show=supportas=anvl/erc
  
   So if your architecture supports query strings of the form
  
 ?name1=value1name2=value2...
  
   it can support ARK inflections.
  
I don't believe that the ARK spec and HTTP URIs are fully compatible
   ideas.
  
  
   True.  A '?' by itself has no meaning in the URI spec, which means it's
   also an opportunity to do something intuitive and important with an
   unused portion of the instruction space (of strings that start out
   looking like URLs).  Any URLs (not just ARKs) could support this.
  
   The THUMP spec (where inflections really live) will be modified to
   require an extra HTTP response header to indicate that the server is
   responding to an inflection and not to a standard URI query string.
   This could help in the '??' case, which actually could be interpreted
   as a valid URI query string.
  
   -John
  
  
  
   --- On Mon, 8 Dec 2014, Ethan Gruber wrote:
  
   Thanks for the info. I'm glad I'm not the only person struggling with
   this.
   I'm not entirely sure my architecture will allow me to append question
   marks in this way (two question marks is probably feasible, but it
  doesn't
   appear that one is). I don't believe that the ARK spec and HTTP URIs
 are
   fully compatible ideas. Hopefully some clearer request parameter or
   content
   negotiation standards emerge.
  
   Ethan
  
   On Sat, Dec 6, 2014 at 10:23 AM, Phillips, Mark 
 mark.phill...@unt.edu
   wrote:
  
Ethan,
  
   As Mark mentioned we have implemented the ARK inflections of ? and ??
   with
   our systems.
  
   I remember the single ? being a bit of a problem to implement in our
   system stack (Apache/mod_python/Django) and from what I can tell
 isn't
   possible with (Apache/mod_wsgi/Django) at all.
  
   The ?? inflection wasn't really a problem for us on either of the
   systems.
  
   From conversations I've had with implementors of ARK,  the issues
  around
   supporting the ? and ?? inflections don't seem to be related to the
   frameworks issues as other issues like commitment to identifiers, the
   fact
   that ARKs are being used in a redirection based system like Handles,
 or
   the
   challenges of accessing the item metadata for items elsewhere in
 their
   system.
  
   I think having a standard set of request parameters or other url
   conventions could be beneficial to the implementation of these
 features
   by
   others.
  
   Mark
   
   From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of
   todd.d.robb...@gmail.com todd.d.robb...@gmail.com
   Sent: Saturday, December 6, 2014 8:23 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] Functional Archival Resource Keys
  
   This brief exchange on Twitter seems relevant:
  
   https://twitter.com/abrennr/status/296948733147508737
  
   On Fri, Dec 5, 2014 at 12:50 PM, Mark A. Matienzo 
   mark.matie...@gmail.com
  
  
wrote:
  
Hi Ethan,
  
   I'm hoping Mark Phillips or one of his colleagues from UNT will
  respond,
   but they have implemented ARK inflections. For example, compare:
  
   http://texashistory.unt.edu/ark:/67531/metapth5828/
   http://texashistory.unt.edu/ark:/67531/metapth5828/?
   http://texashistory.unt.edu/ark:/67531/metapth5828/??
  
   In particular, the challenges posed

Re: [CODE4LIB] Functional Archival Resource Keys

2014-12-08 Thread Ethan Gruber
Thanks for the info. I'm glad I'm not the only person struggling with this.
I'm not entirely sure my architecture will allow me to append question
marks in this way (two question marks is probably feasible, but it doesn't
appear that one is). I don't believe that the ARK spec and HTTP URIs are
fully compatible ideas. Hopefully some clearer request parameter or content
negotiation standards emerge.

Ethan

On Sat, Dec 6, 2014 at 10:23 AM, Phillips, Mark mark.phill...@unt.edu
wrote:

 Ethan,

 As Mark mentioned we have implemented the ARK inflections of ? and ?? with
 our systems.

 I remember the single ? being a bit of a problem to implement in our
 system stack (Apache/mod_python/Django) and from what I can tell isn't
 possible with (Apache/mod_wsgi/Django) at all.

 The ?? inflection wasn't really a problem for us on either of the systems.

 From conversations I've had with implementors of ARK,  the issues around
 supporting the ? and ?? inflections don't seem to be related to the
 frameworks issues as other issues like commitment to identifiers, the fact
 that ARKs are being used in a redirection based system like Handles, or the
 challenges of accessing the item metadata for items elsewhere in their
 system.

 I think having a standard set of request parameters or other url
 conventions could be beneficial to the implementation of these features by
 others.

 Mark
 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of
 todd.d.robb...@gmail.com todd.d.robb...@gmail.com
 Sent: Saturday, December 6, 2014 8:23 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Functional Archival Resource Keys

 This brief exchange on Twitter seems relevant:

 https://twitter.com/abrennr/status/296948733147508737

 On Fri, Dec 5, 2014 at 12:50 PM, Mark A. Matienzo mark.matie...@gmail.com
 
 wrote:

  Hi Ethan,
 
  I'm hoping Mark Phillips or one of his colleagues from UNT will respond,
  but they have implemented ARK inflections. For example, compare:
 
  http://texashistory.unt.edu/ark:/67531/metapth5828/
  http://texashistory.unt.edu/ark:/67531/metapth5828/?
  http://texashistory.unt.edu/ark:/67531/metapth5828/??
 
  In particular, the challenges posed by inflections are described in this
  DC2014 paper [0] by Sébastien Peyrard and Jean-Philippe Tramoni from the
  BNF and John A. Kunze from CDL.
 
  [0] http://dcpapers.dublincore.org/pubs/article/view/3704/1927
 
  Cheers,
  Mark
 
 
  --
  Mark A. Matienzo m...@matienzo.org
  Director of Technology, Digital Public Library of America
 
  On Fri, Dec 5, 2014 at 2:36 PM, Ethan Gruber ewg4x...@gmail.com wrote:
 
   I was recently reading the wikipedia article for Archival Resource Keys
   (ARKs, http://en.wikipedia.org/wiki/Archival_Resource_Key), and there
  was
   a
   bit of functionality that a resource is supposed to deliver that we
 don't
   in our system, nor do any other systems that I've seen that implement
 ARK
   URIs.
  
   From the article:
  
   An ARK contains the label *ark:* after the URL's hostname, which sets
  the
   expectation that, when submitted to a web browser, the URL terminated
 by
   '?' returns a brief metadata record, and the URL terminated by '??'
  returns
   metadata that includes a commitment statement from the current service
   provider.
  
   Looking at the official documentation (
   https://confluence.ucop.edu/display/Curation/ARK), they provided an
   example
   of http://ark.cdlib.org/ark:/13030/tf5p30086k? which is supposed to
  return
   something called an Electronic Resource Citation, but it doesn't work.
   Probably because, and correct me if I'm wrong, using question marks in
 a
   URL in this way doesn't really work in HTTP.
  
   So, has anyone successfully implemented this? Is it even worth it? I'm
  not
   sure I can even implement this in my own architecture.
  
   Maybe it would be better to recommend a standard set of request
  parameters
   that actually work in REST?
  
   Ethan
  
 



 --
 Tod Robbins
 Digital Asset Manager, MLIS
 todrobbins.com | @todrobbins http://www.twitter.com/#!/todrobbins



[CODE4LIB] Functional Archival Resource Keys

2014-12-05 Thread Ethan Gruber
I was recently reading the wikipedia article for Archival Resource Keys
(ARKs, http://en.wikipedia.org/wiki/Archival_Resource_Key), and there was a
bit of functionality that a resource is supposed to deliver that we don't
in our system, nor do any other systems that I've seen that implement ARK
URIs.

From the article:

An ARK contains the label *ark:* after the URL's hostname, which sets the
expectation that, when submitted to a web browser, the URL terminated by
'?' returns a brief metadata record, and the URL terminated by '??' returns
metadata that includes a commitment statement from the current service
provider.

Looking at the official documentation (
https://confluence.ucop.edu/display/Curation/ARK), they provided an example
of http://ark.cdlib.org/ark:/13030/tf5p30086k? which is supposed to return
something called an Electronic Resource Citation, but it doesn't work.
Probably because, and correct me if I'm wrong, using question marks in a
URL in this way doesn't really work in HTTP.

So, has anyone successfully implemented this? Is it even worth it? I'm not
sure I can even implement this in my own architecture.

Maybe it would be better to recommend a standard set of request parameters
that actually work in REST?

Ethan


Re: [CODE4LIB] Reconciling corporate names?

2014-09-26 Thread Ethan Gruber
I would check with the developers of SNAC (
http://socialarchive.iath.virginia.edu/), as they've spent a lot of time
developing named entity recognition scripts for personal and corporate
names. They might have something you can reuse.

Ethan

On Fri, Sep 26, 2014 at 3:47 PM, Galligan, Patrick pgalli...@rockarch.org
wrote:

 I'm looking to reconcile about 40,000 corporate names against LCNAF to see
 whether they are authorized strings or not, but I'm drawing a blank about
 how to get it done.

 I've used http://freeyourmetadata.org/ for reconciling subject headings
 before, but I can't get it to work for LCNAF. Has anyone had any experience
 in a project like this? I'd love to hear some ideas for automatically
 dealing with a large data set like this that we did not create and do not
 know how the names were created.

 Thanks!

 -Patrick Galligan



[CODE4LIB] xEAC advanced beta / pre-production release ready for further testing

2014-08-29 Thread Ethan Gruber
Hi all,

xEAC (https://github.com/ewg118/xEAC), an open source, XForms-based
framework for the creation and publication of EAC-CPF records (for archival
authorities or scholarly prosopographies) is now ready for another round of
testing. While xEAC is still under development, it is essentially
production-ready for small-to-medium collections of authority records (less
than 100,000).

xEAC handles the majority of the elements in the EAC-CPF schema, with
particular focus on enhancing controlled vocabulary with external linked
open data systems and the semantic linking of relations between entities.
The following LOD lookup mechanisms are supported:

Geography: Geonames, LCNAF, Getty TGN, Pleiades Gazetteer of Ancient Places
Occupations/Functions: Getty AAT
Misc. linking and data import: VIAF, DBpedia, nomisma.org, and SNAC

xEAC supports transformation of EAC-CPF into a rudimentary form of three
different RDF models and posting data into an RDF triplestore by optionally
connecting the system to a SPARQL endpoint. Additionally, EADitor (
https://github.com/ewg118/eaditor), an open source framework for EAD
finding aid creation and publication can hook into a xEAC installation for
controlled vocabulary as well as posting to a triplestore, making it
possible to link archival authorities and content through LOD methodologies.

The recently released American Numismatic Society biographies (
http://numismatics.org/authorities/) and the new version of the archives (
http://numismatics.org/archives/) illustrate this architecture. For
example, the authority record for Edward T. Newell (
http://numismatics.org/authority/newell), contains a dynamically generated
list of archival resources (from a SPARQL query). This method is more
scalable and sustainable in the long run than using the EAC
resourceRelation element. Now that SPARQL has successfully been implemented
in xEAC, I will begin to integrate social network analysis interfaces into
the application.

More information:
Github repository: https://github.com/ewg118/xEAC
XForms for Archives, a blog detailing xEAC and EADitor development, as well
as linked data methodologies applied to archival collections:
http://eaditor.blogspot.com/
xEAC installation instructions: http://wiki.numismatics.org/xeac:xeac

Ethan Gruber
American Numismatic Society


Re: [CODE4LIB] Creating a Linked Data Service

2014-08-07 Thread Ethan Gruber
I agree with others saying linked data is overkill here. If you don't have
an audience in mind or a specific purpose for implementing linked data,
it's not worth it.


On Thu, Aug 7, 2014 at 9:07 AM, Jason Stirnaman jstirna...@kumc.edu wrote:

 Mike,
 Check out
 http://json-ld.org/,
 http://json-ld.org/primer/latest/, and
 https://github.com/digitalbazaar/pyld

 But, if you haven't yet sketched out a model for *your* data, then the LD
 stuff will just be a distraction. The information on Linked Data seems
 overly complex because trying to represent data for the Semantic Web gets
 complex - and verbose.

 As others have suggested, it's never a bad idea to just do the simplest
 thing that could possibly work.[1] Mark recommended writing a simple API.
 That would be a good start to understanding your data model and to
 eventually serving LD. And, you may find that it's enough for now.

 1. http://www.xprogramming.com/Practices/PracSimplest.html

 Jason

 Jason Stirnaman
 Lead, Library Technology Services
 University of Kansas Medical Center
 jstirna...@kumc.edu
 913-588-7319

 On Aug 6, 2014, at 1:45 PM, Michael Beccaria mbecca...@paulsmiths.edu
 wrote:

  I have recently had the opportunity to create a new library web page and
 host it on my own servers. One of the elements of the new page that I want
 to improve upon is providing live or near live information on technology
 availability (10 of 12 laptops available, etc.). That data resides on my
 ILS server and I thought it might be a good time to upgrade the bubble gum
 and duct tape solution I now have to creating a real linked data service
 that would provide that availability information to the web server.
 
  The problem is there is a lot of overly complex and complicated
 information out there onlinked data and RDF and the semantic web etc. and
 I'm looking for a simple guide to creating a very simple linked data
 service with php or python or whatever. Does such a resource exist? Any
 advice on where to start?
  Thanks,
 
  Mike Beccaria
  Systems Librarian
  Head of Digital Initiative
  Paul Smith's College
  518.327.6376
  mbecca...@paulsmiths.edu
  Become a friend of Paul Smith's Library on Facebook today!



Re: [CODE4LIB] OAI Crosswalk XSLT

2014-07-11 Thread Ethan Gruber
The source model seems inordinately complex.


On Fri, Jul 11, 2014 at 10:53 AM, Matthew Sherman matt.r.sher...@gmail.com
wrote:

 I guess it is the doc:element/doc:element/doc:field thing that is mostly
 what it throwing me.


 On Fri, Jul 11, 2014 at 10:52 AM, Dunn, Katie dun...@rpi.edu wrote:

  Hi Matt,
 
  The W3C Recommendation for XPath has some good explanation and examples
  for abbreviated XPath syntax here: http://www.w3.org/TR/xpath-30/#abbrev
 
  Katie
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Matthew Sherman
  Sent: Friday, July 11, 2014 10:39 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] OAI Crosswalk XSLT
 
  Hi Code4Lib folks,
 
  I have a question for those of you who have worked with OAI-PMH.  I am
  currently editing our DSpace OAI crosswalk to include a few custom
 metadata
  field that exist in our repository for publication information and port
  them into a more standard format.  The problem I am running into is the
  select statements they use are not the typical XPath statements I am used
  to.  For example:
 
  xsl:for-each
 
 
 select=doc:metadata/doc:element[@name='dc']/doc:element[@name='type']/doc:element/doc:element/doc:field[@name='value']
  dc:typexsl:value-of select=. //dc:type /xsl:for-each
 
  I know what the . does, but the other select statement is a bit foreign
  to me.  So my question is, does anyone know of some reference material
 that
  can help me make sense of this select?  I need to understand what it is
  doing so I can make my own.  Thanks for any insight you can provide.
 
  Matt Sherman
 



[CODE4LIB] Fwd: [LAWDI] ISAW Papers 7 available

2014-07-09 Thread Ethan Gruber
This may interest some people: current state of linked open data within
classics/classical archaeology. These papers are from the NEH-funded Linked
Ancient World Data Institute, held at the Institute for the Study of the
Ancient World at NYU in 2012 and Drew University in 2013.

Ethan

-- Forwarded message --
From: Sebastian Heath sebastian.he...@gmail.com
Date: Tue, Jul 8, 2014 at 6:58 PM
Subject: [LAWDI] ISAW Papers 7 available
To: la...@googlegroups.com la...@googlegroups.com


Greetings All,

 ISAW Papers 7 is available at

 http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ .

 An important note: There is an update pending. The NYU library will
get to that very shortly so please don't worry if the latest edits you
sent me aren't visible at this moment. I think I'm completely caught
up.

 That link had started to circulate - I bear some responsibility for
that - and we've received queries as to the work's status, along with
positive responses. So let's call it available and tweet, cite, use,
etc. the content. That seems the LAWDI way.

 Second point: VIAF are IDs are still making their way through the
system. I'll update as they become live.

 Many thanks to you all, and to repeat, tweeting, citing, linking from
academia.edu or similar are all highly encouraged.

 Best,

 Sebastian.

--
You received this message because you are subscribed to the Google Groups
LAWDI group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lawdi+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/lawdi.
For more options, visit https://groups.google.com/d/optout.


[CODE4LIB] Archival linked open data: a discussion

2014-05-16 Thread Ethan Gruber
I understand that there is undoubtedly some overlap between this list and
LODLAM (Linked Open Data for Libraries, Archives, and Museums), but I
wanted to pass along a link to a discussion I started in the LODLAM list
about the application of RDF and linked data ontologies to archival
materials and authorities.

There are certainly some very knowledgeable LOD people on this list, and
therefore I don't want the discussion on LODLAM to slip through the cracks.
The application of linked data methodologies is tremendously important to
the archival community.

Here's the permalink to the thread in Google Groups:
https://groups.google.com/d/topic/lod-lam/sIrCqZPaZ8c/discussion

Ethan Gruber
American Numismatic Society


Re: [CODE4LIB] outside of libraryland,

2014-03-19 Thread Ethan Gruber
LODLAM, LAWDI (linked ancient world data institute/initiative), CAA
conference (computer applications in archaeology).
 On Mar 19, 2014 8:20 PM, Coral Sheldon-Hess co...@sheldon-hess.org
wrote:

 A co-founded and co-host a learn-to-code workshop for women and friends,
 locally. (Men are welcomed as long as they are guests of female-identified
 participants.) Like Girl Develop It, but free--and we avoided the color
 pink.

 I'm also nominally on the planning committee for the local hackathon
 (though I mostly just show up at the event itself), and I show up at Code
 for Anchorage (Code for America) meetings at least once a year. :)

 I'm not sure if it counts as belonging, per se, but I'm a lurker on the
 OpenHatch mailing list, and I participate in the Geek Feminism community.
 Until the organizer moved away, I went to local Raspberry Pi hack nights,
 every few weeks.

 Anchorage is small (300k people), so there's no Python Users Group or
 RailsBridge or anything like that, here. There's a Drupal Users Group, and
 I'm on their Meetup; we'll see if I ever show up, though. ;) I dropped our
 local Linux Users Group, because they're mostly just a mailing list for
 flamewars, nowadays; I don't even think they have meetings anymore. ...
 Which gets more at lack of overlap than overlap, doesn't it?

 --
 Coral Sheldon-Hess
 http://sheldon-hess.org/coral
 @web_kunoichi


 On Fri, Mar 14, 2014 at 4:35 PM, Nate Hill nathanielh...@gmail.com
 wrote:

  what coding and technology groups do people on this list belong to and
 find
  valuable?
  I'm curious about how code4lib overlaps (or doesn't) with other domains.
  thanks,
  Nate
 
  --
  Nate Hill
  nathanielh...@gmail.com
  http://4thfloor.chattlibrary.org/
  http://www.natehill.net
 



Re: [CODE4LIB] ArchivesSpace v1.0.7 Released [linked data]

2014-03-06 Thread Ethan Gruber
The issue here that I see is that D2RQ will expose the MySQL database
structure as linked data in some sort of indecipherable ontology and the
end result is probably useless. What Mark alludes to here is that the
developers of ArchivesSpace could write scripts, inherent to the platform,
that could output linked data that conforms to existing or emerging
standards. This is much simpler than introducing D2RQ into the application
layer, and allows for greater control of the export models. As a developer
of different, potentially competing, software applications for EAD and
EAC-CPF publication, who is to say that ArchivesSpace database field names
should be standards or best practices? These are things that should be
determined by the archival community, not a software application.

CIDOC-CRM is capable of representing the structure and relationships
between components of an archival collection. I'm not a huge advocate of
the CRM because I think it has a tendency to be inordinately complex, but
*it* is a standard. Therefore, if the archival community decided that it
would adopt CRM as its RDF data model standard, ArchivesSpace, ICA-AtoM,
EADitor, and other archival management/description systems could adapt to
the needs of the community and offer content in these models.

Ethan


On Thu, Mar 6, 2014 at 10:41 AM, Eric Lease Morgan emor...@nd.edu wrote:

 On Mar 6, 2014, at 9:47 AM, Mark A. Matienzo mark.matie...@gmail.com
 wrote:

  ArchivesSpace has a REST backend API, and requests yield a response in
  JSON. As one option, I'd investigate to publish linked data as JSON-LD.
  Some degree of mapping would be necessary, but I imagine it would be
  significantly easier to that instead of using something like D2RQ.


 If I understand things correctly, using D2RQ to publish database contents
 as linked data is mostly a systems administration task:

   1. download and install D2RQ
   2. run D2RQ-specific script to read a (ArchiveSpace) database schema and
 create a configuration file
   3. run D2RQ with the configuration file
   4. provide access via standard linked data publishing methods
   5. done

 If the field names in the initial database are meaningful, and if the
 database schema is normalized, then D2RQ ought to work pretty well. If many
 archives use ArchiveSpace, then the field names can become “standard” or at
 least “best practices”, and the resulting RDF will be well linked.

 I have downloaded and run ArchiveSpace on my desktop computer. It imported
 some of my EAD files pretty well. It created EAC-CPF files from my names.
 Fun. I didn’t see a way to export things as EAD. The whole interface is
 beautiful and functional. In my copious spare time I will see about
 configuring ArchiveSpace to use a MySQL backend (instead of the embedded
 database), and see about putting D2RQ on top. I think this will be easier
 than learning a new API and building an entire linked data publishing
 system. D2RQ may be an viable option with the understanding that no
 solution is perfect.

 —
 Eric Morgan



[CODE4LIB] xEAC, EAC-CPF publication framework, beta ready for testing

2014-03-06 Thread Ethan Gruber
xEAC is an open-source XForms-based application for creating and managing
EAC-CPF collections. The XForms backend allows editing of the XML documents
in a web form, and relationships between source and target entities are
maintained automatically. It is available at https://github.com/ewg118/xEAC.

I have finally gotten xEAC to a stage where I feel it is ready for wider
testing (and I have updated the installation documentation). This has been
a few months coming, since I had intended to release the beta shortly after
MARAC in November. The xEAC documentation can be found here:
http://wiki.numismatics.org/xeac:xeac

Features

-Create, edit, publish EAC-CPF documents. Most, but not all, EAC-CPF
elements are supported.
-Public user interface migrated to bootstrap 3 to support mobile devices.
-Maps and timelines for visualization of life events.
-Basic faceted search and Solr-based Atom feed in the UI.
-Export in EAC-CPF, KML, and rudimentary RDF/XML. HTML5+RDFa available in
entity record pages.
-Manage semantic relationships between identities (
http://eaditor.blogspot.com/2013/11/maintaining-relationships-in-eac-cpf.html).
Target records are automatically updated with symmetrical or inverse
relationships, where relevant, and relationships are expressed in the RDF
output. TODO: parse relationship ontologies defined in RDF (e.g.,
http://vocab.org/relationship/.rdf) for use in xEAC.

REST interactions

The XForms engine interacts with the following web services to import name
authorities, biographical, or geographic information:

-VIAF lookup
-DBPedia import
-Geonames for modern places (placeEntry element)
-Pleiades Gazetteer of Ancient Places (placeEntry)
-Getty AAT SPARQL (occupation element) (
http://eaditor.blogspot.com/2014/03/linking-eac-cpf-occupations-to-getty-aat.html
)
-SPARQL query mechanism of nomisma.org in the UI (and extensible,
generalizable lookup widgets)

When the OCLC linked data service supports queries by VIAF URI, I will
create a lookup widget to provide lists of related bibliographic resources.

TODO list

I aim to improve xEAC over the following months and incorporate the
following:

-Finish form: Represent all EAC-CPF elements and attributes
-Test for scalability
-Interface with more APIs in the editing interface
-Improve public interface, especially searching and browsing
-Employ SPARQL endpoint for more sophisticated querying and visualization,
automatically publish to SPARQL on EAC-CPF record save.
-Incorporate social network graph visualization (see SPARQL, above)
-Follow evolving best practices in RDF, support export in TEI for
prosopographies (http://wiki.tei-c.org/index.php/Prosopography) and
CIDOC-CRM.
-Interact with SNAC or international entity databases which evolve from it.

Resources:
Blog: http://eaditor.blogspot.com/
MARAC slideshow:
http://eaditor.blogspot.com/2013/11/marac-fall-2013-presentation.html
Prototype site: http://admin.numismatics.org/xeac/


Re: [CODE4LIB] ArchivesSpace v1.0.7 Released [linked data]

2014-03-06 Thread Ethan Gruber
I think that RDFa provides the lowest barrier to entry. Using dcterms for
publisher, creator, title, etc. is a good place to start, and if your
collection (archival, library, museum) links to terms defined in LOD
vocabulary systems (LCSH, Getty, LCNAF, whatever), output these URIs in the
HTML interface and tag them in RDFa in such a way that they are
semantically meaningful, e.g., a href=http://vocab.getty.edu/aat/300028569;
rel=dcterms:formatmanuscripts (document genre)/a

It would be great if content management systems supported RDFa right out of
the box, and perhaps they are all moving in this direction. But you don't
need a content management system to do this. If you generate static HTML
files for your finding aids from EAD files using XSLT, you can tweak your
XSLT output to handle RDFa.

Ethan


On Thu, Mar 6, 2014 at 12:56 PM, Eric Lease Morgan emor...@nd.edu wrote:

 Let me ask a more direct question. If participating in linked data is a
 “good thing”, then how do you — anybody here — suggest archivists (or
 librarians or museum curators) do that starting today? —Eric Morgan



Re: [CODE4LIB] links from finding aid to digital object

2014-01-15 Thread Ethan Gruber
You could also try the EAD list if you need more examples.
On Jan 15, 2014 8:45 AM, Edward Summers e...@pobox.com wrote:

 Thanks for all the responses about linking finding aids to digital objects
 yesterday — it was very helpful! I haven’t done much work (yet) looking to
 see what the patterns are. But a few people contacted me asking me to
 provide the results. so I have pulled out the examples into a document
 that’s up on Github:

 https://github.com/edsu/eadlinks

 If you don’t want your name/email listed let me know. I thought it might
 be helpful for anyone that wanted to follow up.

 //Ed



Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber
I'm not sure that I agree that RDF is not a serialization.  It really
depends on the context of the system and intended use of the linked data.
For example, TEI is designed with a specific purpose which cannot be
replicated in RDF (at least, not very easily at all), but deriving RDF from
highly-linked TEI to put into an endpoint can open doors to queries which
are otherwise impossible to make on the data.  This certainly requires some
rethinking of the way texts interact.  But perhaps it may be best to say
that RDF *can* (but not necessarily) be a derivation, rather than
serialization, of some larger, more complex canonical data model.

Ethan


On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein 
arubi...@library.umass.edu wrote:

 I think you’ve hit the nail on the head here, Karen. I would just add, or
 maybe reassure, that this does not necessarily require rethinking your
 existing metadata but how to translate that existing metadata into a linked
 data environment. Though this might seem like a pain, in many cases it will
 actually inspire you to go back and improve/increase the value of that
 existing metadata.

 This is definitely looking awesome, Eric!

 Aaron

 On Nov 19, 2013, at 9:41 AM, Karen Coyle li...@kcoyle.net wrote:

  Eric, I think this skips a step - which is the design step in which you
 create a domain model that uses linked data as its basis. RDF is not a
 serialization; it actually may require you to re-think the basic structure
 of your metadata. The reason for that is that it provides capabilities that
 record-based data models do not. Rather than starting with current
 metadata, you need to take a step back and ask: what does my information
 world look like as linked data?
 
  I repeat: RDF is NOT A SERIALIZATION.
 
  kc
 
  On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
  I believe participating in the Semantic Web and providing content via
 the principles of linked data is not rocket surgery, especially for
 cultural heritage institutions -- libraries, archives, and museums. Here is
 a simple recipe for their participation:
 
1. use existing metadata standards (MARC, EAD, etc.) to describe
   collections
 
2. use any number of existing tools to convert the metadata to
   HTML, and save the HTML on a Web server
 
3. use any number of existing tools to convert the metadata to
   RDF/XML (or some other serialization of RDF), and save the
   RDF/XML on a Web server
 
4. rest, congratulate yourself, and share your experience with
   others in your domain
 
5. after the first time though, go back to Step #1, but this time
   work with other people inside your domain making sure you use as
   many of the same URIs as possible
 
6. after the second time through, go back to Step #1, but this
   time supplement access to your linked data with a triple store,
   thus supporting search
 
7. after the third time through, go back to Step #1, but this
   time use any number of existing tools to expose the content in
   your other information systems (relational databases, OAI-PMH
   data repositories, etc.)
 
8. for dessert, cogitate ways to exploit the linked data in your
   domain to discover new and additional relationships between URIs,
   and thus make the Semantic Web more of a reality
 
  What do you think?
 
  I am in the process of writing a guidebook on the topic of linked data
 and archives. In the guidebook I will elaborate on this recipe and provide
 instructions for its implementation. [1]
 
  [1] guidebook - http://sites.tufts.edu/liam/
 
  --
  Eric Lease Morgan
  University of Notre Dame
 
  --
  Karen Coyle
  kco...@kcoyle.net http://kcoyle.net
  m: 1-510-435-8234
  skype: kcoylenet



Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber
I see that serialization has a different definition in computer science
than I thought it did.


On Tue, Nov 19, 2013 at 10:36 AM, Ross Singer rossfsin...@gmail.com wrote:

 That's still not a serialization.  It's just a similar data model.
  Pretty huge difference.

 -Ross.


 On Tue, Nov 19, 2013 at 10:31 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  I'm not sure that I agree that RDF is not a serialization.  It really
  depends on the context of the system and intended use of the linked data.
  For example, TEI is designed with a specific purpose which cannot be
  replicated in RDF (at least, not very easily at all), but deriving RDF
 from
  highly-linked TEI to put into an endpoint can open doors to queries which
  are otherwise impossible to make on the data.  This certainly requires
 some
  rethinking of the way texts interact.  But perhaps it may be best to say
  that RDF *can* (but not necessarily) be a derivation, rather than
  serialization, of some larger, more complex canonical data model.
 
  Ethan
 
 
  On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein 
  arubi...@library.umass.edu wrote:
 
   I think you’ve hit the nail on the head here, Karen. I would just add,
 or
   maybe reassure, that this does not necessarily require rethinking your
   existing metadata but how to translate that existing metadata into a
  linked
   data environment. Though this might seem like a pain, in many cases it
  will
   actually inspire you to go back and improve/increase the value of that
   existing metadata.
  
   This is definitely looking awesome, Eric!
  
   Aaron
  
   On Nov 19, 2013, at 9:41 AM, Karen Coyle li...@kcoyle.net wrote:
  
Eric, I think this skips a step - which is the design step in which
 you
   create a domain model that uses linked data as its basis. RDF is not a
   serialization; it actually may require you to re-think the basic
  structure
   of your metadata. The reason for that is that it provides capabilities
  that
   record-based data models do not. Rather than starting with current
   metadata, you need to take a step back and ask: what does my
 information
   world look like as linked data?
   
I repeat: RDF is NOT A SERIALIZATION.
   
kc
   
On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
I believe participating in the Semantic Web and providing content
 via
   the principles of linked data is not rocket surgery, especially for
   cultural heritage institutions -- libraries, archives, and museums.
 Here
  is
   a simple recipe for their participation:
   
  1. use existing metadata standards (MARC, EAD, etc.) to describe
 collections
   
  2. use any number of existing tools to convert the metadata to
 HTML, and save the HTML on a Web server
   
  3. use any number of existing tools to convert the metadata to
 RDF/XML (or some other serialization of RDF), and save the
 RDF/XML on a Web server
   
  4. rest, congratulate yourself, and share your experience with
 others in your domain
   
  5. after the first time though, go back to Step #1, but this time
 work with other people inside your domain making sure you use
 as
 many of the same URIs as possible
   
  6. after the second time through, go back to Step #1, but this
 time supplement access to your linked data with a triple store,
 thus supporting search
   
  7. after the third time through, go back to Step #1, but this
 time use any number of existing tools to expose the content in
 your other information systems (relational databases, OAI-PMH
 data repositories, etc.)
   
  8. for dessert, cogitate ways to exploit the linked data in your
 domain to discover new and additional relationships between
 URIs,
 and thus make the Semantic Web more of a reality
   
What do you think?
   
I am in the process of writing a guidebook on the topic of linked
 data
   and archives. In the guidebook I will elaborate on this recipe and
  provide
   instructions for its implementation. [1]
   
[1] guidebook - http://sites.tufts.edu/liam/
   
--
Eric Lease Morgan
University of Notre Dame
   
--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet
  
 



Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber
yo, i get it


On Tue, Nov 19, 2013 at 10:54 AM, Ross Singer rossfsin...@gmail.com wrote:

 I don't know what your definition of serialization is, but I don't know
 of any where data model and formatted output of a data model are
 synonymous.

 RDF is a data model *not* a serialization.

 -Ross.


 On Tue, Nov 19, 2013 at 10:45 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  I see that serialization has a different definition in computer science
  than I thought it did.
 
 
  On Tue, Nov 19, 2013 at 10:36 AM, Ross Singer rossfsin...@gmail.com
  wrote:
 
   That's still not a serialization.  It's just a similar data model.
Pretty huge difference.
  
   -Ross.
  
  
   On Tue, Nov 19, 2013 at 10:31 AM, Ethan Gruber ewg4x...@gmail.com
  wrote:
  
I'm not sure that I agree that RDF is not a serialization.  It really
depends on the context of the system and intended use of the linked
  data.
For example, TEI is designed with a specific purpose which cannot be
replicated in RDF (at least, not very easily at all), but deriving
 RDF
   from
highly-linked TEI to put into an endpoint can open doors to queries
  which
are otherwise impossible to make on the data.  This certainly
 requires
   some
rethinking of the way texts interact.  But perhaps it may be best to
  say
that RDF *can* (but not necessarily) be a derivation, rather than
serialization, of some larger, more complex canonical data model.
   
Ethan
   
   
On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein 
arubi...@library.umass.edu wrote:
   
 I think you’ve hit the nail on the head here, Karen. I would just
  add,
   or
 maybe reassure, that this does not necessarily require rethinking
  your
 existing metadata but how to translate that existing metadata into
 a
linked
 data environment. Though this might seem like a pain, in many cases
  it
will
 actually inspire you to go back and improve/increase the value of
  that
 existing metadata.

 This is definitely looking awesome, Eric!

 Aaron

 On Nov 19, 2013, at 9:41 AM, Karen Coyle li...@kcoyle.net wrote:

  Eric, I think this skips a step - which is the design step in
 which
   you
 create a domain model that uses linked data as its basis. RDF is
 not
  a
 serialization; it actually may require you to re-think the basic
structure
 of your metadata. The reason for that is that it provides
  capabilities
that
 record-based data models do not. Rather than starting with current
 metadata, you need to take a step back and ask: what does my
   information
 world look like as linked data?
 
  I repeat: RDF is NOT A SERIALIZATION.
 
  kc
 
  On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
  I believe participating in the Semantic Web and providing
 content
   via
 the principles of linked data is not rocket surgery, especially
 for
 cultural heritage institutions -- libraries, archives, and museums.
   Here
is
 a simple recipe for their participation:
 
1. use existing metadata standards (MARC, EAD, etc.) to
 describe
   collections
 
2. use any number of existing tools to convert the metadata to
   HTML, and save the HTML on a Web server
 
3. use any number of existing tools to convert the metadata to
   RDF/XML (or some other serialization of RDF), and save
 the
   RDF/XML on a Web server
 
4. rest, congratulate yourself, and share your experience with
   others in your domain
 
5. after the first time though, go back to Step #1, but this
  time
   work with other people inside your domain making sure you
 use
   as
   many of the same URIs as possible
 
6. after the second time through, go back to Step #1, but this
   time supplement access to your linked data with a triple
  store,
   thus supporting search
 
7. after the third time through, go back to Step #1, but this
   time use any number of existing tools to expose the content
  in
   your other information systems (relational databases,
 OAI-PMH
   data repositories, etc.)
 
8. for dessert, cogitate ways to exploit the linked data in
 your
   domain to discover new and additional relationships between
   URIs,
   and thus make the Semantic Web more of a reality
 
  What do you think?
 
  I am in the process of writing a guidebook on the topic of
 linked
   data
 and archives. In the guidebook I will elaborate on this recipe and
provide
 instructions for its implementation. [1]
 
  [1] guidebook - http://sites.tufts.edu/liam/
 
  --
  Eric Lease Morgan
  University of Notre Dame
 
  --
  Karen Coyle
  kco...@kcoyle.net http://kcoyle.net
  m: 1-510-435-8234
  skype: kcoylenet

   
  
 



Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber
Hasn't the pendulum swung back toward RDFa Lite (
http://www.w3.org/TR/rdfa-lite/) recently?  They are fairly equivalent, but
I'm not sure about all the politics involved.


On Tue, Nov 19, 2013 at 11:09 AM, Karen Coyle li...@kcoyle.net wrote:

 Eric, if you want to leap into the linked data world in the fastest,
 easiest way possible, then I suggest looking at microdata markup, e.g.
 schema.org.[1] Schema.org does not require you to transform your data at
 all: it only requires mark-up of your online displays. This makes sense
 because as long as your data is in local databases, it's not visible to the
 linked data universe anyway; so why not take the easy way out and just add
 linked data to your public online displays? This doesn't require a
 transformation of your entire record (some of which may not be suitable as
 linked data in any case), only those things that are likely to link
 usefully. This latter generally means things for which you have an
 identifier. And you make no changes to your database, only to display.

 OCLC is already producing this markup in WorldCat records [2]-- not
 perfectly, of course, lots of warts, but it is a first step. However, it is
 a first step that makes more sense to me than *transforming* or
 *cross-walking* current metadata. It also, I believe, will help us
 understand what bits of our current metadata will make the transition to
 linked data, and what bits should remain as accessible documents that users
 can reach through linked data.

 kc
 [1] http://schema.org, and look at the work going on to add bibliographic
 properties at http://www.w3.org/community/schemabibex/wiki/Main_Page
 [2] look at the linked data section of any WorldCat page for a single
 item, such ashttp://www.worldcat.org/title/selection-of-early-
 statistical-papers-of-j-neyman/oclc/527725referer=brief_results




 On 11/19/13 7:54 AM, Eric Lease Morgan wrote:

 On Nov 19, 2013, at 9:41 AM, Karen Coyle li...@kcoyle.net wrote:

  Eric, I think this skips a step - which is the design step in which you
 create a domain model that uses linked data as its basis. RDF is not a
 serialization; it actually may require you to re-think the basic
 structure of your metadata. The reason for that is that it provides
 capabilities that record-based data models do not. Rather than starting
 with current metadata, you need to take a step back and ask: what does
 my information world look like as linked data?


 I respectfully disagree. I do not think it necessary to create a domain
 model ahead of time; I do not think it is necessary for us to re-think our
 metadata structures. There already exists tools enabling us — cultural
 heritage institutions — to manifest our metadata as RDF. The manifestations
 may not be perfect, but “we need to learn to walk before we run” and the
 metadata structures we have right now will work for right now. As we mature
 we can refine our processes. I do not advocate “stepping back and asking”.
 I advocate looking forward and doing. —Eric Morgan


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 m: 1-510-435-8234
 skype: kcoylenet



Re: [CODE4LIB] Charlotte, NC Code4Lib Meeting

2013-11-14 Thread Ethan Gruber
Asheville +1


On Thu, Nov 14, 2013 at 4:20 PM, Simon Spero sesunc...@gmail.com wrote:

 Anyone thought about doing a code4lib in Asheville?
 What about Raleigh?
 :-P
 On Nov 12, 2013 8:42 PM, Kevin S. Clarke kscla...@gmail.com wrote:

  I'd be interested. I'm in Boone... not too far a drive. :)
 
  Kevin
  On Nov 12, 2013 6:35 PM, Riley Childs ri...@tfsgeo.com wrote:
 
   Is anyone in Charlotte, NC (and surrounding areas) interested in
  starting a
   Code4Lib meeting?
   Just kind of asking :{D!
   *Riley Childs*
   *Library Technology Manager at Charlotte United Christian Academy
   http://cucawarriors.com/*
   *Head Programmer/Manager at Open Library Management Projec
   http://openlibman.sf.net/t http://openlibman.sourceforge.net/*
   *Cisco Certified Entry Network Technician *
   _
  
   *Phone: +1 (704) 497-2086*
   *email: ri...@tfsgeo.com ri...@tfsgeo.com*
   *Twitter: @RowdyChildren http://twitter.com/rowdychildren*
  
 



Re: [CODE4LIB] Charlotte, NC Code4Lib Meeting

2013-11-12 Thread Ethan Gruber
I'm in Virginia and might attend said meeting, even if I can't help
organize.
On Nov 12, 2013 6:35 PM, Riley Childs ri...@tfsgeo.com wrote:

 Is anyone in Charlotte, NC (and surrounding areas) interested in starting a
 Code4Lib meeting?
 Just kind of asking :{D!
 *Riley Childs*
 *Library Technology Manager at Charlotte United Christian Academy
 http://cucawarriors.com/*
 *Head Programmer/Manager at Open Library Management Projec
 http://openlibman.sf.net/t http://openlibman.sourceforge.net/*
 *Cisco Certified Entry Network Technician *
 _

 *Phone: +1 (704) 497-2086*
 *email: ri...@tfsgeo.com ri...@tfsgeo.com*
 *Twitter: @RowdyChildren http://twitter.com/rowdychildren*



Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Ethan Gruber
I've been using Apache Fuseki (
http://jena.apache.org/documentation/serving_data/) for almost a year, in
production since the spring.  It's a SPARQL server with a built in TBD.
It's easy to use, and takes about 5 minutes to get working on your desktop
or server.

Ethan


On Mon, Nov 11, 2013 at 1:17 AM, Richard Wallis 
richard.wal...@dataliberate.com wrote:

 I've had some success with 4Store: http://4store.org

 Used it on mac laptop to load the WorldCat most highly held resources:
 http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/

 As to the point about loading RDF/XML, especially if you have a large
 amount of data.

- Triplestores much prefer raw triples for large amounts of data
- Chopping up files of triples into smaller chunks is also often
beneficial as it reduces memory footprints and can take advantage of
multithreading.  It is also far easier to recover from errors such as
 bad
data etc.
- A bit of unix command line wizardry (split followed a simple for-loop)
is fairly standard practice

 Also raw triples are often easier to produce - none of that mucking about
 producing correctly formatted XML - and you can chop, sort, and play about
 with them using powerful unix command line tools.

 ~Richard.


 On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.org
 wrote:

  I've primarily used Sesame myself.  The http based queries made it pretty
  easy to script against.
 
  http://www.openrdf.org/
 
 
  On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu
  wrote:
 
   What is your favorite RDF triplestore?
  
   I am able to convert numerous library-related metadata formats into
   RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
   simply putting the resulting files on an HTTP file system. But if I
 were
  to
   import my RDF/XML into a triplestore, then I could do a lot more. Jena
   seems like a good option. So does Openlink Virtuoso.
  
   What experience do y'all have with these tools, and do you know how to
   import RDF/XML into them?
  
   --
   Eric Lease Morgan
  
 
 
 
  --
  *Scott Turnbull*
  APTrust Technical Lead
  scott.turnb...@aptrust.org
  www.aptrust.org
  678-379-9488
 



 --
 Richard Wallis
 Founder, Data Liberate
 http://dataliberate.com
 Tel: +44 (0)7767 886 005

 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw



Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-10 Thread Ethan Gruber
Does anyone have experience with an image zooming engine in conjunction
with image annotation? I don't want end users to annotate things
themselves, but allow them to click on annotations added by an archivist.

Thanks,
Ethan
On Nov 8, 2013 4:39 PM, Edward Summers e...@pobox.com wrote:

 I’m having trouble understanding who the user of this content you are
 putting into Omeka is, and what you are expecting them to do with it. But,
 ok …

 //Ed

 On Nov 8, 2013, at 4:22 PM, Kyle Banerjee kyle.baner...@gmail.com wrote:

  It is sad to me that converting to PDF for viewing off the Web seems
 like
  the answer. Isn’t there a tiling viewer (like Leaflet) that could be
 used
  to render jpeg derivatives of the original tif files in Omeka?
 
 
  This should be pretty easy. But the issue with tiling is that the nav
  process is miserable for all but the shortest books. Most of the people
 who
  want to download want are looking for jpegs rather than source tiffs and
  one pdf instead of a bunch of tiffs (which is good since each one is
  typically over 100MB). Of course there are people who want the real deal,
  but that's actually a much less common use case.
 
  As Karen observes, downloading and viewing serve different use cases so
 of
  course we will provide both. IIP Image Server looks intriguing. But most
 of
  our users who want the full res stuff really just want to download the
  source tiffs which will be made available.
 
  kyle



Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-08 Thread Ethan Gruber
I've done something like this in imagemagick, and it worked quite well, so
I can vouch for this workflow.  But just to clarify, I presume you will be
creating static PDF files to place in the filesystem--not generate a PDF
dynamically through Omeka when a user clicks to download a PDF (as in,
Omeka files off an imagemagick process).

Ethan
On Nov 8, 2013 2:00 PM, Kyle Banerjee kyle.baner...@gmail.com wrote:

 We are in the process of migrating our digital collections from CONTENTdm
 to Omeka and are trying to figure out what to do about the compound objects
 -- the vast majority of which are digitized books.

 The source files are actually hi res tiffs but since ginormous objects
 broken into hundreds of pieces (each of which can be well over 100MB in
 size) aren't exactly friendly to use, we'd like to stitch them into
 individual pdf's that can be viewed more conveniently

 My game plan is to simply have a script pull the files down as jpegs which
 can be fed to imagemagick which can theoretically do everything I need.
 However, I've never actually done anything like this before, so I wanted to
 see if there's a method that people have used for combining lots of images
 into pdfs that works particularly well. Thanks,

 kyle



Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-08 Thread Ethan Gruber
On the same note, I've had good experiences with using adore djatoka to
render jpeg2000 files. Maybe something better has since come along. I'm out
of touch with this type of technology.
On Nov 8, 2013 2:10 PM, Edward Summers e...@pobox.com wrote:

 It is sad to me that converting to PDF for viewing off the Web seems like
 the answer. Isn’t there a tiling viewer (like Leaflet) that could be used
 to render jpeg derivatives of the original tif files in Omeka?

 For an example of using Leaflet (usually used for working with maps) in
 this way checkout NYTimes Machine Beta:

 http://apps.beta620.nytimes.com/timesmachine/1969/07/20/issue.html

 //Ed

 On Nov 8, 2013, at 2:00 PM, Kyle Banerjee kyle.baner...@gmail.com wrote:

  We are in the process of migrating our digital collections from CONTENTdm
  to Omeka and are trying to figure out what to do about the compound
 objects
  -- the vast majority of which are digitized books.
 
  The source files are actually hi res tiffs but since ginormous objects
  broken into hundreds of pieces (each of which can be well over 100MB in
  size) aren't exactly friendly to use, we'd like to stitch them into
  individual pdf's that can be viewed more conveniently
 
  My game plan is to simply have a script pull the files down as jpegs
 which
  can be fed to imagemagick which can theoretically do everything I need.
  However, I've never actually done anything like this before, so I wanted
 to
  see if there's a method that people have used for combining lots of
 images
  into pdfs that works particularly well. Thanks,
 
  kyle



Re: [CODE4LIB] rdf serialization

2013-11-06 Thread Ethan Gruber
I think that the answer to #1 is that if you want or expect people to use
your endpoint that you should document how it works: the ontologies, the
models, and a variety of example SPARQL queries, ranging from simple to
complex.  The British Museum's SPARQL endpoint (
http://collection.britishmuseum.org/sparql) is highly touted, but how many
people actually use it?  I understand your point about SPARQL being too
complicated for an API interface, but the best examples of services built
on SPARQL are probably the ones you don't even realize are built on SPARQL
(e.g., http://numismatics.org/ocre/id/ric.1%282%29.aug.4A#mapTab).  So on
one hand, perhaps only the most dedicated and hardcore researchers will
venture to construct SPARQL queries for your endpoint, but on the other,
you can build some pretty visualizations based on SPARQL queries conducted
in the background from the user's interaction with a simple html/javascript
based interface.

Ethan


On Wed, Nov 6, 2013 at 11:54 AM, Ross Singer rossfsin...@gmail.com wrote:

 Hey Karen,

 It's purely anecdotal (albeit anecdotes borne from working at a company
 that offered, and has since abandoned, a sparql-based triple store
 service), but I just don't see the interest in arbitrary SPARQL queries
 against remote datasets that I do against linking to (and grabbing) known
 items.  I think there are multiple reasons for this:

 1) Unless you're already familiar with the dataset behind the SPARQL
 endpoint, where do you even start with constructing useful queries?
 2) SPARQL as a query language is a combination of being too powerful and
 completely useless in practice: query timeouts are commonplace, endpoints
 don't support all of 1.1, etc.  And, going back to point #1, it's hard to
 know how to optimize your queries unless you are already pretty familiar
 with the data
 3) SPARQL is a flawed API interface from the get-go (IMHO) for the same
 reason we don't offer a public SQL interface to our RDBMSes

 Which isn't to say it doesn't have its uses or applications.

 I just think that in most cases domain/service-specific APIs (be they
 RESTful, based on the Linked Data API [0], whatever) will likely be favored
 over generic SPARQL endpoints.  Are n+1 different APIs ideal?  I am pretty
 sure the answer is no, but that's the future I foresee, personally.

 -Ross.
 0. https://code.google.com/p/linked-data-api/wiki/Specification


 On Wed, Nov 6, 2013 at 11:28 AM, Karen Coyle li...@kcoyle.net wrote:

  Ross, I agree with your statement that data doesn't have to be RDF all
  the way down, etc. But I'd like to hear more about why you think SPARQL
  availability has less value, and if you see an alternative to SPARQL for
  querying.
 
  kc
 
 
 
  On 11/6/13 8:11 AM, Ross Singer wrote:
 
  Hugh, I don't think you're in the weeds with your question (and, while I
  think that named graphs can provide a solution to your particular
 problem,
  that doesn't necessarily mean that it doesn't raise more questions or
  potentially more frustrations down the line - like any new power, it can
  be
  used for good or evil and the difference might not be obvious at first).
 
  My question for you, however, is why are you using a triple store for
  this?
That is, why bother with the broad and general model in what I assume
  is a
  closed world assumption in your application?
 
  We don't generally use XML databases (Marklogic being a notable
  exception),
  or MARC databases, or insert your transmission format of
 choice-specific
  databases because usually transmission formats are designed to account
 for
  lots and lots of variations and maximum flexibility, which generally is
  the
  opposite of the modeling that goes into a specific app.
 
  I think there's a world of difference between modeling your data so it
 can
  be represented in RDF (and, possibly, available via SPARQL, but I think
  there is *far* less value there) and committing to RDF all the way down.
RDF is a generalization so multiple parties can agree on what data
  means,
  but I would have a hard time swallowing the argument that
 domain-specific
  data must be RDF-native.
 
  -Ross.
 
 
  On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless philomou...@gmail.com
  wrote:
 
   Does that work right down to the level of the individual triple though?
  If
  a large percentage of my triples are each in their own individual
 graphs,
  won't that be chaos? I really don't know the answer, it's not a
  rhetorical
  question!
 
  Hugh
 
  On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com
 wrote:
 
   Named Graphs are the way to solve the issue you bring up in that post,
  in
  my opinion.  You mint an identifier for the graph, and associate the
  provenance and other information with that.  This then gets ingested
 as
 
  the
 
  4th URI into a quad store, so you don't lose the provenance
 information.
 
  In JSON-LD:
  {
@id : uri-for-graph,
dcterms:creator : uri-for-hugh,
@graph : [
 // ... 

Re: [CODE4LIB] We should use HTTPS on code4lib.org

2013-11-04 Thread Ethan Gruber
NSA broke it already


On Mon, Nov 4, 2013 at 1:42 PM, William Denton w...@pobox.com wrote:

 I think it's time we made everything on code4lib.org use HTTPS by default
 and redirect people to HTTPS from HTTP when needed.  (Right now there's an
 outdated self-signed SSL certificate on the site, so someone took a stab at
 this earlier, but it's time to do it right.)

 StartCom gives free SSL certs [0], and there are lots of places that sell
 them for prices that seem to run over $100 per year (which seems ridiculous
 to me, but maybe there's a good reason).

 I don't know which is the best way to get a cert for a site like this, but
 if people agree this is the right thing to do, perhaps someone with some
 expertise could work with the Oregon State hosts?

 More broadly, I think everyone should be using HTTPS everywhere (and HTTPS
 Everywhere, the browser extension).  Are any of you implementing HTTPS on
 your institution's sites, and moving to it as default?  It's one of those
 slightly finicky things that on the surface isn't necessary (why bother
 with a library's opening hours or address?) but deeper down is, because
 everyone should be able to browse the web without being monitored.

 Bill

 [0] https://cert.startcom.org/

 --
 William Denton
 Toronto, Canada
 http://www.miskatonic.org/



[CODE4LIB] Numismatic Data Standards and Ontologies Roundtable at CAA 2014

2013-10-22 Thread Ethan Gruber
Andrew Meadows, Karsten Tolle, and David Wigg-Wolf invite participants for
a roundtable on numismatic data standards and exchange, to be held at the
Computer Applications and Quantitative Methods in Archaeology (CAA)
conference (http://caa2014.sciencesconf.org/), Paris, 22-25 April 2014.

Coins survive in vast numbers from many historical periods and cultures,
providing important evidence for a wide variety of social, political and
economic aspects of those cultures. But currently these data are only
potentially available, as differing national traditions have yet to
integrate their substantial datasets on the basis of shared vocabularies,
syntax and structure.

Building on the experience with Linked Data of projects such as nomisma.org,
the European Coin Find Network (ECFN:
http://www.ecfn.fundmuenzen.eu/Home.html) and Online Coins of the Roman
Empire (OCRE: http://numismatics.org/ocre/), the roundtable will provide a
forum for the presentation and discussion of (meta)data standards and
ontologies for data repositories containing information on coins, with a
view to advancing the possibilities of data exchange and facilitating
access to data across a range of repositories. The round table follows on
from the two joint meetings of nomisma.org and ECFN, which concentrated on
ancient, primarily Roman coins, held in Frankfurt, Germany in May 2012; and
Carnuntum, Austria in April 2013, which was attended by 25 participants
from 10 European countries and the USA. The round table is intended to
encourage discussion among a wider community, beyond that of ancient
numismatics, drawing together lessons from a broader range of projects, and
embedding the results in the more general landscape of cultural heritage
data management. Too often in the past numismatists have allowed themselves
to operate in isolation from other related disciplines, including
archaeology, a deficit that this session also aims to address.

Although the core data required to identify and describe coins of almost
all periods are relatively simple (e.g. issuer, mint, date, denomination,
material, weight, size, description of obverse and reverse, etc.), and this
can result in a significant degree of correlation between the structure of
different repositories, linking disparate numismatics repositories presents
a number of problems. Nevertheless, coins provide an ideal test bed for the
implementation of concepts such as Linked Data and the creation of
standardised thesauri, the lessons of which can be profitably applied to
other, more complex fields.

Organizers:

Dr Andrew Meadows
Deputy Director
American Numismatic Society

Dr Karsten Tolle
DBIS
Goethe University

Dr David Wigg-Wolf
Römisch-Germanische Kommission des Deutschen Archäologischen Instituts


Re: [CODE4LIB] CODE4LIB Digest - 12 Sep 2013 to 13 Sep 2013 (#2013-237)

2013-09-16 Thread Ethan Gruber
Using SPARQL to validate seems like tremendous overhead.  From the Gerber
abstract: A total of 55 rules have been defined representing the
constraints and requirements of the OA Specification and Ontology. For each
rule we have defined a SPARQL query to check compliance. I hope this isn't
55 SPARQL queries per RDF resource.

Europeana's review of schematron indicated what I pointed out earlier, that
it confines one to using RDF/XML, which is sub-optimal in their own
words.  One could accept RDF in any serialization and then run it through
an RDF processor, like rapper (http://librdf.org/raptor/rapper.html), into
XML and then validate.  Eventually, when XPath/XSLT 3 supports JSON and
other non-XML data models, theoretically, schematron might then be able to
validate other serializations of RDF.  Ditto for XForms, which we are using
to validate RDF/XML.  Obviously, this is sub-optimal because our workflow
doesn't yet account for non-XML data.  We will probably go with the rapper
intermediary process until XForms 2 is released.

Ethan


On Mon, Sep 16, 2013 at 10:22 AM, Karen Coyle li...@kcoyle.net wrote:

 On 9/16/13 6:29 AM, aj...@virginia.edu wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I'd suggest that perhaps the confusion arises because This instance is
 (not) 'valid' according to that ontology. might be inferred from an
 instance and an ontology (under certain conditions), and that's the soul of
 what we're asking when we define constraints on the data. Perhaps OWL can
 be used to express conditions of validity, as long as we represent the
 quality valid for use in inferences.


 Based on the results of the RDF Validation workshop [1], validation is
 being expressed today as SPARQL rules. If you express the rules in OWL then
 unfortunately you affect downstream re-use of your ontology, and that can
 create a mess for inferencing and can add a burden onto any reasoners,
 which are supposed to apply the OWL declarations.

 One participant at the workshop demonstrated a system that used the OWL
 constraints as constraints, but only in a closed system. I think that the
 use of SPARQL is superior because it does not affect the semantics of the
 classes and properties, only the instance data, and that means that the
 same properties can be validated differently for different applications or
 under different contexts. As an example, one community may wish to say that
 their metadata can have one and only one dc:title, while others may allow
 more than one. You do not want to constrain dc:title throughout the Web,
 only your own use of it. (Tom Baker and I presented a solution to this on
 the second day as Application Profiles [2], as defined by the DC community).

 kc
 [1] 
 https://www.w3.org/2012/12/**rdf-val/agendahttps://www.w3.org/2012/12/rdf-val/agenda
 [2] http://www.w3.org/2001/sw/**wiki/images/e/ef/Baker-dc-**
 abstract-model-revised.pdfhttp://www.w3.org/2001/sw/wiki/images/e/ef/Baker-dc-abstract-model-revised.pdf


  - ---
 A. Soroka
 The University of Virginia Library

 On Sep 13, 2013, at 11:00 PM, CODE4LIB automatic digest system wrote:

  Also, remember that OWL does NOT constrain your data, it constrains only
 the inferences that you can make about your data. OWL operates at the
 ontology level, not the data level. (The OWL 2 documentation makes this
 more clear, in my reading of it. I agree that the example you cite sure
 looks like a constraint on the data... it's very confusing.)

 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
 Comment: GPGTools - http://gpgtools.org

 iQEcBAEBAgAGBQJSNwe2AAoJEATpPY**SyaoIkwLcIAK+**sMzy1XkqLStg94F2I40pe
 0DepjqVhdPnaDS1Msg7pd7c7iC0L5N**hCWd9BxzdvRgeMRr123zZ3EmKDSy8X**ZiGf
 uQyXlA9cOqpCxdQLj2zXv5VHrOdlsA**1UAGprwhYrxOz/**v3xQ7b2nXusRoZRfDlts
 iadvWx5DhLEb2+**uVl9geteeymLIVUTzm8WnUITEE7by2**HAQf9VlT9CrQSVQ21wLC
 hvmk47Nt8WIGyPwRh1qOhvIXLDLxD9**rkBSC1G01RhzwvctDy88Tmt2Ut47ZR**EScP
 YUz/bf/qxITzX2L7tE35s2w+**RUIFIFc4nJa3Xhp0wMoTAz5UYMiWIc**XZ38qfGlY=
 =PJTS
 -END PGP SIGNATURE-


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 m: 1-510-435-8234
 skype: kcoylenet



Re: [CODE4LIB] Expressing negatives and similar in RDF

2013-09-13 Thread Ethan Gruber
+1


On Fri, Sep 13, 2013 at 8:51 AM, Esmé Cowles escow...@ucsd.edu wrote:

 Thomas-

 This isn't something I've run across yet.  But one thing you could do is
 create some URIs for different kinds of unknown/nonexistent titles:

 example:book1 dc:title example:unknownTitle
 example:book2 dc:title example:noTitle
 etc.

 You could then describe example:unknownTitle with a label or comment to
 fully describe the states you wanted to capture with the different
 categories.

 -Esme
 --
 Esme Cowles escow...@ucsd.edu

 Necessity is the plea for every infringement of human freedom. It is the
  argument of tyrants; it is the creed of slaves. -- William Pitt, 1783

 On 09/13/2013, at 7:32 AM, Meehan, Thomas t.mee...@ucl.ac.uk wrote:

  Hello,
 
  I'm not sure how sensible a question this is (it's certainly
 theoretical), but it cropped up in relation to a rare books cataloguing
 discussion. Is there a standard or accepted way to express negatives in
 RDF? This is best explained by examples, expressed in mock-turtle:
 
  If I want  to say this book has the title Cats in RDA I would do
 something like:
 
  example:thisbook dc:title Cats in RDA .
 
  Normally, if a predicate like dc:title is not relevant to
 example:thisbook I believe I am right in thinking that it would simply be
 missing, i.e. it is not part of a record where a set number of fields need
 to be filled in, so no need to even make the statement. However, there are
 occasions where a positively negative statement might be useful. I
 understand OWL has a way of managing the statement This book does not have
 the title Cats in RDA [1]:
 
  []  rdf:type owl:NegativePropertyAssertion ;
  owl:sourceIndividual   example:thisbook ;
  owl:assertionProperty  dc:title ;
  owl:targetIndividual   Cats in RDA .
 
  However, it would be more useful, and quite common at least in a
 bibliographic context, to say This book does not have a title. Ideally
 (?!) there would be an ontology of concepts like none, unknown, or even
 something, but unspecified:
 
  This book has no title:
  example:thisbook dc:title hasobject:false .
 
  It is unknown if this book has a title (sounds undesirable but I can
 think of instances where it might be handy[2]):
  example:thisbook dc:title hasobject:unknown .
 
  This book has a title but it has not been specified:
  example:thisbook dc:title hasobject:true .
 
  In terms of cataloguing, the answer is perhaps to refer to the rules
 (which would normally mandate supplied titles in square brackets and so
 forth) rather than use RDF to express this kind of thing, although the
 rules differ depending on the part of description and, in the case of the
 kind of thing that prompted the question- the presence of clasps on rare
 books- there are no rules. I wonder if anyone has any more wisdom on this.
 
  Many thanks,
 
  Tom
 
  [1] Adapted from
 http://www.w3.org/2007/OWL/wiki/Primer#Object_Properties
  [2] No many tbh, but e.g. title in an unknown script or indecipherable
 hand.
 
  ---
 
  Thomas Meehan
  Head of Current Cataloguing
  Library Services
  University College London
  Gower Street
  London WC1E 6BT
 
  t.mee...@ucl.ac.uk



Re: [CODE4LIB] W3C RDF Validation Workshop

2013-09-12 Thread Ethan Gruber
RDF is not the be all end all for representing information, so I don't know
if there is a point to defining a validation schema which can also be
represented in RDF since requirements vary from model to model, project to
project.  If you were creating RDF/XML, you could enforce complex
validation through schematron.  XForms 2.0 will support JSON and other
non-XML data models, so you could enforce complex validation through XForms
bindings since XPath 3 will support parsing JSON, thus JSON-LD.

Our project consists of (at the moment) tens of thousands of concepts
defined at URIs and represented by XHTML+RDFa fragments.  These bits of
XHTML are edited in XForms, so the validation is pretty tight.  The
XHTML+RDFa is transformed into RDF proper upon file save and posted into
our endpoint with the SPARQL/Update mechanism.

But my broader point is: RDF (typically) is a derivative resource of a more
detailed data model.  In the case where the RDF is derivative of a
canonical resource/document, validation can be applied more consistently
during the editing process of the canonical resource.

Ethan


On Thu, Sep 12, 2013 at 11:19 AM, Karen Coyle li...@kcoyle.net wrote:

 I followed the W3C RDF Validation Workshop [1] over the last two days. The
 web page has both written papers and slides from each presentation.

 The short summary is that a number of users of RDF have found a need to do
 traditional style validation (required, one or more, must be numeric/from a
 list, etc.) on their RDF metadata. There is currently no RDF-based standard
 for defining validation rules, so each of these is an ad hoc solution which
 cannot be easily exchanged. [2]

 The actual technology of validation in all cases is SPARQL. Whether or not
 this really scales is one of the questions, but it seems pretty clear that
 SPARQL will continue to be the solution for the near future.

 I will try to write up a blog post that will give some more information.

 kc


 [1] 
 https://www.w3.org/2012/12/**rdf-val/agendahttps://www.w3.org/2012/12/rdf-val/agenda
 [2] nota bene: Although OWL appears to provide validation rules, the OWL
 rules only support inferencing. OWL cannot be used to constrain your data
 to valid values.

 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet



Re: [CODE4LIB] What do you want to learn about linked data?

2013-09-04 Thread Ethan Gruber
There's a lot of really great linked data stuff going on in classical
studies.  The Pelagios project (http://pelagios-project.blogspot.com/) is
one of the best examples because the bar for participation is set very
low.  The RDF model is very simple, linking objects (works of literature,
sculpture, papyri, coins, whatever) represented at URIs to URIs for places
defined in the Pleiades Gazetteer of Ancient Places (
http://pleiades.stoa.org/), enabling aggregation of content based on
geography.

Ethan


On Wed, Sep 4, 2013 at 10:01 AM, Eric Lease Morgan emor...@nd.edu wrote:

 On Sep 4, 2013, at 9:42 AM, Eric Lease Morgan emor...@nd.edu wrote:

  I get the basic concepts of linked data.  But what I don't understand is
  why the idea has been around so long, yet there seems to be a dearth of
  useful applications that live up to the hype.  So, what I want to learn
  about linked data is: who's using it effectively?  Maybe there's lots of
  stuff out there that I just don't know about?
 
  I've been doing some reading and evaluating in the regard to Linked Data
 [0], and I think the problem is multi-diminentional:


 And here is yet another perspective. Maybe Linked Data is really too hard
 to implement. Think OAI-PMH. It was suppose to be a low barrier method for
 making metadata available to the world -- an idea not dissimilar to the
 ideas behind Linked Data and the Semantic Web. Heck, all you needed was
 Dublin Core and the creation of various XML streams distributed by servers
 who knew only a handful of commands.

 Unfortunately, few people went beyond Dublin Core and the weaknesses of
 the vocabulary became extremely apparent. Just look at the OAI available
 from things like ContentDM -- thin to say the least. In the end OAI was not
 seen as low barrier as once thought. Low barrier for computer types, but
 not necessarily so for others. From the concluding remarks in a 2006 paper
 by Carl Lagoze given at JCDL:

   Metadata Aggregation and “Automated Digital Libraries”: A
   Retrospective on the NSDL Experience

   Over the last three years the NSDL CI team has learned that a
   seemingly modest architecture based on metadata harvesting is
   surprisingly difficult to manage in a large-scale implementation.
   The administrative difficulties result from a combination of
   provider difficulties with OAI-PMH and Dublin Core, the
   complexities in consistent handling of multiple metadata feeds
   over a large number of iterations, and the limitations of
   metadata quality remediation.

   http://arxiv.org/pdf/cs/0601125.pdf

 The issues with Linked Data and the Semantic Web may be similar, but does
 that mean we should give it a try?

 --
 Eric Lease Morgan



Re: [CODE4LIB] Subject Terms in Institutional Repositories

2013-08-30 Thread Ethan Gruber
I'd hold off on AAT until the release of the Getty vocabularies as linked
open data in the near future.  No sense in investing time to purchase or
otherwise harvest terms from the Getty's current framework when the
architecture is going to change very soon.

On a related note, the British Museum's art-related thesauri are already
linked open data, but not as transparent and accessible as one would prefer.

Ethan


On Fri, Aug 30, 2013 at 9:44 AM, Jacob Ratliff jaratlif...@gmail.comwrote:

 That does help, thanks.

 So, what you probably need to do then is take some time to strategically
 think about what you want the controlled vocabularies to accomplish, and
 what types of resources you have available to implement them.

 How granular do you want to be in each subject area? (e.g. Do you want to
 use MeSH https://www.nlm.nih.gov/mesh/ for all the medical information,
 or is that too detailed?)
 Are you just looking for cursory subject headings so that people can find a
 larger collection that they're looking for? (LoC could be good for this)
 Are you going to use a different controlled vocabulary for each collection?
 (e.g. MeSH for dentistry, LoC for general, etc.)
 Who is going to go back and re-tag all of the digital objects with new
 metadata?

 You can also look at www.taxonomywarehouse.com for some ideas of different
 controlled vocabularies that are available. I also recommend the Art and
 Architecture Thesaurus http://www.getty.edu/vow/AATSearchPage.jsp for
 art
 assets.

 Is this kind of what you're looking for? I highly recommend sitting down
 and defining what your goals are for the controlled vocabulary you want to
 implement, because that will inform that type of vocabulary you use.

 Jacob Ratliff
 Archivist/Taxonomy Librarian
 National Fire Protection Association


 On Fri, Aug 30, 2013 at 9:36 AM, Matthew Sherman
 matt.r.sher...@gmail.comwrote:

  Sorry, I probably should have provided a bit more depth.  It is a
  University Institutional Repository so we have a rather varied collection
  of materials from engineering to education to computer science to
  chiropractic to dental to some student theses and posters.  So I guess I
  need to find something at is extensible.  Does that provide a better idea
  or should I provide more info?
 
 
  On Fri, Aug 30, 2013 at 9:32 AM, Jacob Ratliff jaratlif...@gmail.com
  wrote:
 
   Hi Matt,
  
   It depends on the subject area of your repository. There are dozens of
   controlled vocabularies that exist (not including specific Enterprise
   Content Management controlled vocabularies). If you can describe your
   collection, people might be able to advise you better.
  
   Jacob Ratliff
   Archivist/Taxonomy Librarian
   National Fire Protection Association
  
  
   On Fri, Aug 30, 2013 at 9:26 AM, Matthew Sherman
   matt.r.sher...@gmail.comwrote:
  
Hello Code4Libbers,
   
I am working on cleaning up our institutional repository, and one of
  the
big areas of improvement needed is the list of terms from the subject
fields.  It is messy and I want to take the subject terms and place
  them
into a much better order.  I was contemplating using Library of
  Congress
Subject Headings, but I wanted to see what others have done in this
  area
   to
see if there is another good controlled vocabulary that could work
   better.
Any insight is welcome.  Thanks for your time everyone.
   
Matt Sherman
Digital Content Librarian
University of Bridgeport
   
  
 



Re: [CODE4LIB] linked archival metadata: a guidebook

2013-08-12 Thread Ethan Gruber
I'll implement your linked data specifications into EADitor as soon as
they're ready.  In fact, I began implementing Aaron Rubinstein's hybrid
arch/dc ontology (http://gslis.simmons.edu/archival/arch/index.html) a few
days ago.

Ethan


On Mon, Aug 12, 2013 at 9:23 AM, Stephen Marks steve.ma...@utoronto.cawrote:

 Hi Eric--

 Good luck! I'll be very interested to see how this shapes up.

 Best,

 Steve



 On Aug-12-2013 9:10 AM, Eric Lease Morgan wrote:

 This is the tiniest of introductions as a person who will be writing a
 text called Linked Archival Metadata: A Guidebook. The Guidebook will be
 the product of LiAM [0], and from the prospectus [1], the purpose of the
 Guidebook is to:

provide archivists with an overview of the current linked data
landscape, define basic concepts, identify practical strategies
for adoption, and emphasize the tangible payoffs for archives
implementing linked data. It will focus on clarifying why
archives and archival users can benefit from linked data and will
identify a graduated approach to applying linked data methods to
archival description.

 To these ends I plan to write towards three audiences: 1) the layman who
 knows nothing about linked data, 2) the archivist who wants to make their
 content available as linked data but does not know how, and 3) the computer
 technologist who knows how to make linked data accessible but does not know
 about archival practices.

 Personally, I have been dabbling on and off with linked data and the
 Semantic Web for a number of years. I have also been deeply involved with a
 project called the Catholic Research Resources Alliance [2] whose content
 mostly comes from archives. I hope to marry these two sets of experiences
 into something that will be useful to cultural heritage institutions,
 especially archives.

 The Guidebook is intended to be manifested in both book (PDF) and wiki
 forms. The work begins now and is expected to be completed by March 2014.
 On my mark. Get set. Go. Wish me luck, and let’s see if we can build some
 community.

 [0] LiAM - http://sites.tufts.edu/liam/
 [1] prospectus - http://bit.ly/15TX0rs
 [2] Catholic Research Resources Alliance - http://www.catholicresearch.**
 net/ http://www.catholicresearch.net/

 --
 Eric Lease Morgan



 --



 Stephen Marks
 Digital Preservation Librarian
 Scholars Portal
 Ontario Council of University Libraries

 step...@scholarsportal.info
 416.946.0300



Re: [CODE4LIB] Python and Ruby

2013-07-30 Thread Ethan Gruber
All languages other than assembly are boutique and must be eliminated like
the cancer that they are.


On Tue, Jul 30, 2013 at 11:14 AM, Ross Singer rossfsin...@gmail.com wrote:

 What would you consider a boutique language?  What isn't?

 -Ross.


 On Tue, Jul 30, 2013 at 10:21 AM, Rich Wenger rwen...@mit.edu wrote:

  The proliferation of boutique languages is a cancer on our community.
   Each one is a YAP (Yet Another Priesthood), and little else.  The world
  does not need five slightly varying syntaxes for a substring function.
 If I
  had switched languages every time the web community recommended it, I
  would have rewritten a mountain of apps at least twice in the past five
  years.  What's next, a separate language to put periods at the end of
  sentences? Just my $.02.  That is all.
 
  Rich Wenger
  E-Resource Systems Manager, MIT Libraries
  rwen...@mit.edu
  617-253-0035
 
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of
  Joshua Welker
  Sent: Tuesday, July 30, 2013 9:56 AM
  To: CODE4LIB@listserv.nd.edu
  Subject: Re: [CODE4LIB] Python and Ruby
 
  I am already a big user of PHP for web apps, but PHP does not make a
  fantastic scripting language in my experience.
 
  Josh Welker
  Information Technology Librarian
  James C. Kirkpatrick Library
  University of Central Missouri
  Warrensburg, MO 64093
  JCKL 2260
  660.543.8022
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Riley Childs
  Sent: Tuesday, July 30, 2013 8:18 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Python and Ruby
 
  No mention of PHP?
 
  Sent from my iPhone
 
  On Jul 30, 2013, at 9:14 AM, Kurt Nordstrom doseofvitam...@gmail.com
  wrote:
 
   Whoohoo, late to the party!
  
   I like Python because I learned it first, and I haven't had a need to
   explore Ruby yet.
  
   I did briefly foray into learning Ruby in order to try to learn Rails,
   and I actually found that my background in Python sort of gave me
   brain-jam for learning Ruby, because the languages were so close
   together, but just different in some ways. So my mind would be 'oh, so
   it's just insert Python idiom here but then, it's not. If I tackle
   Ruby again, I will definitely try to 'empty my cup' first.
  
   -K
  
  
   On Tue, Jul 30, 2013 at 8:55 AM, Marc Chantreux m...@unistra.fr wrote:
  
   hello,
  
   Sorry comming late with it but:
  
   On Mon, Jul 29, 2013 at 10:43:33AM -0500, Joshua Welker wrote:
   Not intending to start a language flame war/holy war here, but in
   the library coding community, is there a particular reason to use
   Ruby over Python or vice-versa?
  
   Is it the only choices you have? Because I'd personnally advice none
   of them
  
   I tested both of them before stucking to Perl just because
  
   * it is very pleasant when it come to explore and modify
   datastructures  and strings (which library things are).
   * the ecosystem is briliant: perl comes with lot of libraries and
   tools  with a quality i haven't found in other languages.
  
   Of course, perl is not perfect and i really would like to use a
   modern emerging compiled language like go, rust, haskell or even
   something on the jvm (like clojure or the emerging perl6) but all of
   them miss libraries.
  
   HTH
   regards
   --
   Marc Chantreux
   Université de Strasbourg, Direction Informatique
   14 Rue René Descartes,
   67084  STRASBOURG CEDEX
   ☎: 03.68.85.57.40
   http://unistra.fr
   Don't believe everything you read on the Internet
  -- Abraham Lincoln
  
  
  
   --
   http://www.blar.net/kurt/blog/
 



[CODE4LIB] Machine tags and flickr commons

2013-07-10 Thread Ethan Gruber
There is an enormous body of open photographs contributed by a myriad of
libraries and museums to flickr.  Is anyone aware of any efforts to
associate machine tags with these photos, for example to georeference with
geonames machine tags, tag people with VIAF ids, or categorize with LCSH
ids?  A quick Google search turns up nothing.  There's a little bit of this
going on with Pleiades ids for ancient geography (
http://www.flickr.com/photos/tags/pleiades%3A*/), but there's enormous
potential in library-produced images.

I think it would be incredibly powerful to aggregate images of manuscripts
created by Thomas Jefferson (VIAF id: 41866059) across institutions that
have digitized and uploaded them to flickr.

Ethan


Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber
You'd write some javascript to query the service with every keystroke, e.g.
http://id.loc.gov/authorities/suggest/?q=Hi replies with subjects beginning
with hi*  It looks like covo.js supports LCSH, so you could look into
that.

Ethan


On Wed, Jun 5, 2013 at 9:13 AM, Joshua Welker jwel...@sbuniv.edu wrote:

 This would work, except I would need a way to get all the subjects rather
 than just biology. Any idea how to do that? I tried removing the
 querystring from the URL and changing Biology in the URL to  with no
 success.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Michael J. Giarlo
 Sent: Tuesday, June 04, 2013 7:05 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LOC Subject Headings API

 How about id.loc.gov's OpenSearch-powered autosuggest feature?

 mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology
 [Biology,[Biology,Biology Colloquium,Biology Curators'
 Group,Biology Databook Editorial Board (U.S.),Biology and Earth
 Sciences Teaching Institute,Biology and Management of True Fir in the
 Pacific Northwest Symposium (1981 : Seattle, Wash.),Biology and Resource
 Management Program (Alaska Cooperative Park Studies Unit),Biology and
 behavior series,Biology and environment (Macmillan Press),Biology and
 management of old-growth forests],[1 result,1 result,1 result,1
 result,1 result,1 result,1 result,1 result,1 result,1
 result],[http://id.loc.gov/authorities/subjects/sh85014203,;
 http://id.loc.gov/authorities/names/n79006962,;
 http://id.loc.gov/authorities/names/n90639795,;
 http://id.loc.gov/authorities/names/n85100466,;
 http://id.loc.gov/authorities/names/nr97041787,;
 http://id.loc.gov/authorities/names/n85276541,;
 http://id.loc.gov/authorities/names/n82057525,;
 http://id.loc.gov/authorities/names/n90605518,;
 http://id.loc.gov/authorities/names/nr2001011448,;
 http://id.loc.gov/authorities/names/no94028058;]]

 -Mike



 On Tue, Jun 4, 2013 at 7:51 PM, Joshua Welker jwel...@sbuniv.edu wrote:

  I did see that, and it will work in a pinch. But the authority file is
  pretty massive--almost 1GB-- and would be difficult to handle in an
  automated way and without completely killing my web app due to memory
  constraints while searching the file. Thanks, though.
 
  Josh Welker
 
 
  -Original Message-
  From: Bryan Baldus [mailto:bryan.bal...@quality-books.com]
  Sent: Tuesday, June 04, 2013 6:39 PM
  To: Code for Libraries; Joshua Welker
  Subject: RE: LOC Subject Headings API
 
  On Tuesday, June 04, 2013 6:31 PM, Joshua Welker [jwel...@sbuniv.edu]
  wrote:
  I am building an auto-suggest feature into our library's search box,
  and
  I am wanting to include LOC subject headings in my suggestions list.
  Does anyone know of any web service that allows for automated
  harvesting of LOC Subject Headings? I am also looking for name
 authorities, for that matter.
  Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I
  have spent a while Googling with no luck, but this seems like the sort
  of general-purpose thing that a lot of people would be interested in.
  I feel like I must be missing something. Any help is appreciated.
 
  Have you seen http://id.loc.gov/ with bulk downloads in various
  formats at http://id.loc.gov/download/
 
  I hope this helps,
 
  Bryan Baldus
  Senior Cataloger
  Quality Books Inc.
  The Best of America's Independent Presses
  1-800-323-4241x402
  bryan.bal...@quality-books.com
  eij...@cpan.org
  http://home.comcast.net/~eijabb/
 



Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber
Are you referring to hierarchical sets of terms, like United
States--History--War with Mexico, 1845-1848?  This is an earlier
established term of http://id.loc.gov/authorities/subjects/sh85140201 (now
labeled Mexican War, 1846-1848).  Ed Summers or Kevin Ford are in a
better position to discuss the change of terminology, but it looks like
LCSH is moving past this string-based hierarchy in favor of one expressed
in terms of linked data.

Ethan


On Wed, Jun 5, 2013 at 9:32 AM, Joshua Welker jwel...@sbuniv.edu wrote:

 I've seen those, but I can't figure out where on the id.loc.gov site
 there is actually a URL that provides a list of authority terms. All the
 links on the site seem to link to other pages within the site.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Dana Pearson
 Sent: Tuesday, June 04, 2013 6:42 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LOC Subject Headings API

 Joshua,

 There are different formats at LOC:

 http://id.loc.gov/authorities/subjects.html

 dana


 On Tue, Jun 4, 2013 at 6:31 PM, Joshua Welker jwel...@sbuniv.edu wrote:

  I am building an auto-suggest feature into our library's search box,
  and I am wanting to include LOC subject headings in my suggestions
  list. Does anyone know of any web service that allows for automated
  harvesting of LOC Subject Headings? I am also looking for name
 authorities, for that matter.
  Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I
  have spent a while Googling with no luck, but this seems like the sort
  of general-purpose thing that a lot of people would be interested in.
  I feel like I must be missing something. Any help is appreciated.
 
  Josh Welker
  Electronic/Media Services Librarian
  College Liaison
  University Libraries
  Southwest Baptist University
  417.328.1624
 



 --
 Dana Pearson
 dbpearsonmlis.com



Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber
I once put all of the LCSH headings into a local Solr index and used
TermsComponent to power autosuggest.  It was really fast.

Ethan


On Wed, Jun 5, 2013 at 12:47 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 I realized since I made that comment that the API is designed to give the
 top 10 subject heading suggestions rather than all of them.

 So that part is fine. But I am once again unsure if the API will work for
 me. I am creating a mashup of several data sources for my auto-suggest
 feature, and I am having a hard time dynamically adding the results from
 the LOC Suggest API to the existing collection of data that is used to
 populate my jQuery UI Autocomplete field. Ideally, I'd like to be able to
 have all the LC Subject Heading data cached on my server so that I can
 build my autocomplete data source one time rather than having to deal with
 dynamically adding, sorting, etc. But then the problem I run into is that
 the LCSH master file is so big that it basically crashes the server.

 That's why I'm thinking I might have to give up on this project.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Michael J. Giarlo
 Sent: Wednesday, June 05, 2013 9:59 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LOC Subject Headings API

 Josh,

 Can you say more about how the API isn't behaving as you expected it to?

 -Mike



 On Wed, Jun 5, 2013 at 10:37 AM, Joshua Welker jwel...@sbuniv.edu wrote:

  I went with this method and made some good progress, but the results
  the API was returning were not what I expected. I might have to give
  up on this project.
 
  Josh Welker
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of Ethan Gruber
  Sent: Wednesday, June 05, 2013 8:22 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] LOC Subject Headings API
 
  You'd write some javascript to query the service with every keystroke,
 e.g.
  http://id.loc.gov/authorities/suggest/?q=Hi replies with subjects
  beginning with hi*  It looks like covo.js supports LCSH, so you
  could look into that.
 
  Ethan
 
 
  On Wed, Jun 5, 2013 at 9:13 AM, Joshua Welker jwel...@sbuniv.edu
 wrote:
 
   This would work, except I would need a way to get all the subjects
   rather than just biology. Any idea how to do that? I tried removing
   the querystring from the URL and changing Biology in the URL to 
   with no success.
  
   Josh Welker
  
  
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
   Of Michael J. Giarlo
   Sent: Tuesday, June 04, 2013 7:05 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] LOC Subject Headings API
  
   How about id.loc.gov's OpenSearch-powered autosuggest feature?
  
   mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology
   [Biology,[Biology,Biology Colloquium,Biology Curators'
   Group,Biology Databook Editorial Board (U.S.),Biology and Earth
   Sciences Teaching Institute,Biology and Management of True Fir in
   the Pacific Northwest Symposium (1981 : Seattle, Wash.),Biology
   and Resource Management Program (Alaska Cooperative Park Studies
   Unit),Biology and behavior series,Biology and environment
   (Macmillan Press),Biology and management of old-growth
   forests],[1
   result,1 result,1 result,1
   result,1 result,1 result,1 result,1 result,1 result,1
   result],[http://id.loc.gov/authorities/subjects/sh85014203,;
   http://id.loc.gov/authorities/names/n79006962,;
   http://id.loc.gov/authorities/names/n90639795,;
   http://id.loc.gov/authorities/names/n85100466,;
   http://id.loc.gov/authorities/names/nr97041787,;
   http://id.loc.gov/authorities/names/n85276541,;
   http://id.loc.gov/authorities/names/n82057525,;
   http://id.loc.gov/authorities/names/n90605518,;
   http://id.loc.gov/authorities/names/nr2001011448,;
   http://id.loc.gov/authorities/names/no94028058;]]
  
   -Mike
  
  
  
   On Tue, Jun 4, 2013 at 7:51 PM, Joshua Welker jwel...@sbuniv.edu
  wrote:
  
I did see that, and it will work in a pinch. But the authority
file is pretty massive--almost 1GB-- and would be difficult to
handle in an automated way and without completely killing my web
app due to memory constraints while searching the file. Thanks,
 though.
   
Josh Welker
   
   
-Original Message-
From: Bryan Baldus [mailto:bryan.bal...@quality-books.com]
Sent: Tuesday, June 04, 2013 6:39 PM
To: Code for Libraries; Joshua Welker
Subject: RE: LOC Subject Headings API
   
On Tuesday, June 04, 2013 6:31 PM, Joshua Welker
[jwel...@sbuniv.edu]
wrote:
I am building an auto-suggest feature into our library's search
box, and
I am wanting to include LOC subject headings in my suggestions list.
Does anyone know of any web service that allows for automated
harvesting of LOC Subject Headings? I am also looking for name

Re: [CODE4LIB] WorldCat Implements Content-Negotiation for Linked Data

2013-06-03 Thread Ethan Gruber
+1


On Mon, Jun 3, 2013 at 3:00 PM, Richard Wallis 
richard.wal...@dataliberate.com wrote:

 The Linked Data for the millions of resources in WorldCat.org is now
 available as RDF/XML, JSON-LD, Turtle, and Triples via content-negotiation.

 Details:

 http://dataliberate.com/2013/06/content-negotiation-for-worldcat/

 ~Richard.



Re: [CODE4LIB] Visualizing RDF graphs

2013-05-02 Thread Ethan Gruber
Wow, that's pretty cool.  I tried one of the dbpedia examples.  I look
forward to playing around with it with our data.

Ethan


On Thu, May 2, 2013 at 5:40 AM, raffaele messuti raffaele.mess...@gmail.com
 wrote:

 Ethan Gruber wrote:
  This looks like it does what I want to do, but it requires Virtuoso and a
  Scala environment.  I'm hesitant to dramatically modify my architecture
  just to accommodate a feature.  I think I favor something a little
 simpler.

 take a look at LodLive, it's a simple jquery plugin
 http://en.lodlive.it/
 https://github.com/dvcama/LodLive


 --
 raffaele



[CODE4LIB] Visualizing RDF graphs

2013-05-01 Thread Ethan Gruber
Hi all,

I have a fair amount of data in a triplestore, and I'd like to experiment
with different forms of visualization.  I have found a few libraries for
visualizing RDF graphs through Google, but they still seem relatively
rudimentary.  Does anyone on the list have recommendations?  I'm looking
for something that can use SPARQL.  I'd like to avoid creating duplicates
or derivatives of data, like GraphML, unless it is possible to render
GraphML which has been serialized from SPARQL results on the fly.

Thanks,
Ethan


Re: [CODE4LIB] Visualizing RDF graphs

2013-05-01 Thread Ethan Gruber
Hey Mark,

This looks like it does what I want to do, but it requires Virtuoso and a
Scala environment.  I'm hesitant to dramatically modify my architecture
just to accommodate a feature.  I think I favor something a little simpler.

Thanks,
Ethan


On Wed, May 1, 2013 at 10:33 AM, Mark A. Matienzo
mark.matie...@gmail.comwrote:

 Hi Ethan,

 Have you looked at Payola? https://github.com/payola/Payola

 Mark

 --
 Mark A. Matienzo m...@matienzo.org
 Digital Archivist, Manuscripts and Archives, Yale University Library
 Technical Architect, ArchivesSpace


 On Wed, May 1, 2013 at 9:24 AM, Ethan Gruber ewg4x...@gmail.com wrote:
  Hi all,
 
  I have a fair amount of data in a triplestore, and I'd like to experiment
  with different forms of visualization.  I have found a few libraries for
  visualizing RDF graphs through Google, but they still seem relatively
  rudimentary.  Does anyone on the list have recommendations?  I'm looking
  for something that can use SPARQL.  I'd like to avoid creating duplicates
  or derivatives of data, like GraphML, unless it is possible to render
  GraphML which has been serialized from SPARQL results on the fly.
 
  Thanks,
  Ethan



Re: [CODE4LIB] tiff2pdf, then back to pdf?

2013-04-26 Thread Ethan Gruber
What's your use case in this scenario? Do you want to provide access to the
PDFs over the web or are you using them as your archival format?  You
probably don't want to use PDF to achieve both objectives.

Ethan
On Apr 26, 2013 5:11 PM, Edward M. Corrado ecorr...@ecorrado.us wrote:

 This works sometimes. Well, it does give me a new tiff file from the pdf
 all of the time, but it is not always anywhere near the same size as the
 original tiff. My guess is that maybe there is a flag or somethign that
 woulf help. Here is what I get with one fil:


 ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif
 A001a.pdf
 ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf
 A001b.tif
 ecorrado@ecorrado:~/Desktop/test$ ls -al
 total 361056
 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 .
 drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 ..
 -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf
 -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif
 -rw-rw-r-- 1 ecorrado ecorrado  5871196 Apr 26 17:07 A001b.tif


 In this case, the two tif files should be the same size. They are not even
 close. Maybe there is a flag to convert (besides compress) that I can use.
 FWIW: I tried three files/ 2 are like this. The other one, the resulting
 tiff is the same size as the original.

 Edward





 On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.edu
 wrote:

  Imagemagick's convert will do it both ways.
 
  convert a.tiff b.pdf
  convert b.pdf a.tiff
 
  If the pdf is more than one page, the tiff will be a multipage tiff.
 
  Aaron
 
  --
  Aaron Addison
  Unix Administrator
  W. E. B. Du Bois Library UMass Amherst
  413 577 2104
 
 
 
  On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote:
   Hi All,
  
   I have a need to batch convert many TIFF images to PDF. I'd then like
 to
  be
   able to discard the TIFF images, but I can only do that if I can create
  the
   original TIFF again from the PDF. Is this possible? If so, using what
  tools
   and how?
  
   tiff2pdf seems like a possible solution, but I can't find a
 corresponding
   pdf2tif program that reverses the process.
  
   Any ideas?
  
   Edward
 



Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-22 Thread Ethan Gruber
I have a follow-up:

By default, Jetty starts Fuseki with -Xmx1200M for heap.  Have you altered
this for production?  How many triples do you have and how often does your
endpoint process queries?  Our dataset won't be large at first (low
millions of triples), but we can reasonably expect 10,000+ SPARQL queries
per day.  That's not a lot by dbpedia standards, but I have no idea how
that compares to average LAM systems.

Thanks,
Ethan


On Thu, Feb 21, 2013 at 9:42 AM, Ethan Gruber ewg4x...@gmail.com wrote:

 Thanks everyone for the info. This soothed my apprehensions of running
 Fuseki in a production environment.

 Ethan


 On Wed, Feb 20, 2013 at 4:05 PM, Ross Singer rossfsin...@gmail.comwrote:

 I'll add that the LARQ plugin for Fuseki (which adds Lucene indexes) is
 pretty awesome, as well.

 -Ross.

 On Feb 20, 2013, at 3:57 PM, John Fereira ja...@cornell.edu wrote:

  If forgot about that.  That issue was created quite awhile ago and I
 hadn't check on it in a long time.  I've found that Jetty has worked fine
 in our production environment so far.  As I wrote earlier, I have it
 connecting to a jena SDB that is used for a semantic web application (VIVO)
 that was developed here.  Although we have the semantic web application
 running on a different server than the SDB database I found the performance
 was fairly significantly improved by having the Fuseki server running on
 the same machine as the SDB.
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of Ethan Gruber
  Sent: Wednesday, February 20, 2013 2:52 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Fuseki and other SPARQL servers
 
  Hi Hugh,
 
  I have investigated the possibility of deploying Fuseki as a war in
 Tomcat (
  https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
 how the default Jetty container would respond in production, but since you
 aren't having any problems with that deployment, I may go ahead and do that.
 
  Ethan
 
 
  On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless philomou...@gmail.com
 wrote:
 
  Hi Ethan!
 
  We've been using Jena/Fuseki in papyri.info for about a year now,
 iirc.
  We started with Mulgara, but switched. It's running in its own Jetty
  container in our system, but I've had no performance issues with it
  whatever.
 
  Best,
  Hugh
 
  On Feb 20, 2013, at 14:31 , Ethan Gruber ewg4x...@gmail.com wrote:
 
  Hi all,
 
  I have been playing around with Fuseki (
  http://jena.apache.org/documentation/serving_data/index.html) for a
  few months to get my feet wet with accessing and querying RDF.  I
  quite like it. I find it well documented and easy to set up.  We
  will soon deploy a SPARQL server in a production environment, and I
  would like to know if others on the list have experience with Fuseki
  in production, or have
  other
  recommendations.  Mulgara is off the table as it inexplicably
  conflicts with other apps installed in Tomcat.
 
  Thanks,
  Ethan
 





Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-21 Thread Ethan Gruber
Thanks everyone for the info. This soothed my apprehensions of running
Fuseki in a production environment.

Ethan


On Wed, Feb 20, 2013 at 4:05 PM, Ross Singer rossfsin...@gmail.com wrote:

 I'll add that the LARQ plugin for Fuseki (which adds Lucene indexes) is
 pretty awesome, as well.

 -Ross.

 On Feb 20, 2013, at 3:57 PM, John Fereira ja...@cornell.edu wrote:

  If forgot about that.  That issue was created quite awhile ago and I
 hadn't check on it in a long time.  I've found that Jetty has worked fine
 in our production environment so far.  As I wrote earlier, I have it
 connecting to a jena SDB that is used for a semantic web application (VIVO)
 that was developed here.  Although we have the semantic web application
 running on a different server than the SDB database I found the performance
 was fairly significantly improved by having the Fuseki server running on
 the same machine as the SDB.
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ethan Gruber
  Sent: Wednesday, February 20, 2013 2:52 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Fuseki and other SPARQL servers
 
  Hi Hugh,
 
  I have investigated the possibility of deploying Fuseki as a war in
 Tomcat (
  https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
 how the default Jetty container would respond in production, but since you
 aren't having any problems with that deployment, I may go ahead and do that.
 
  Ethan
 
 
  On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless philomou...@gmail.com
 wrote:
 
  Hi Ethan!
 
  We've been using Jena/Fuseki in papyri.info for about a year now, iirc.
  We started with Mulgara, but switched. It's running in its own Jetty
  container in our system, but I've had no performance issues with it
  whatever.
 
  Best,
  Hugh
 
  On Feb 20, 2013, at 14:31 , Ethan Gruber ewg4x...@gmail.com wrote:
 
  Hi all,
 
  I have been playing around with Fuseki (
  http://jena.apache.org/documentation/serving_data/index.html) for a
  few months to get my feet wet with accessing and querying RDF.  I
  quite like it. I find it well documented and easy to set up.  We
  will soon deploy a SPARQL server in a production environment, and I
  would like to know if others on the list have experience with Fuseki
  in production, or have
  other
  recommendations.  Mulgara is off the table as it inexplicably
  conflicts with other apps installed in Tomcat.
 
  Thanks,
  Ethan
 



Re: [CODE4LIB] You are a *pedantic* coder. So what am I?

2013-02-21 Thread Ethan Gruber
Look, I'm sure we can list the many ways different languages fail to meet
our expectations, but is this really a constructive line of conversation?

-1


On Thu, Feb 21, 2013 at 12:40 PM, Justin Coyne
jus...@curationexperts.comwrote:

 I did misspeak a bit.  You can override static methods in Java.  My major
 issue is that there is no getClass() within a static method, so when the
 static method is being run in the context of the inheriting class it is
 unaware of its own run context.

 For example: I want the output to be Hi from bar, but it's Hi from foo:

 class Foo {
   public static void sayHello() {
 hi();
   }
   public static void hi() {
 System.out.println(Hi from foo);
   }
 }

 class Bar extends Foo {

   public static void hi() {
 System.out.println(Hi from bar);
   }
 }

 class Test {
   public static void main(String [ ] args) {
 Bar.sayHello();
   }
 }


 -Justin



 On Thu, Feb 21, 2013 at 11:18 AM, Eric Hellman e...@hellman.net wrote:

  OK, pedant, tell us why you think methods that can be over-ridden are
  static.
  Also, tell us why you think classes in Java are not instances of
  java.lang.Class
 
 
  On Feb 18, 2013, at 1:39 PM, Justin Coyne jus...@curationexperts.com
  wrote:
 
   To be pedantic, Ruby and JavaScript are more Object Oriented than Java
   because they don't have primitives and (in Ruby's case) because classes
  are
   themselves objects.   Unlike Java, both Python and Ruby can properly
   override of static methods on sub-classes. The Java language made many
   compromises as it was designed as a bridge to Object Oriented
 programming
   for programmers who were used to writing C and C++.
  
   -Justin
  
 



Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-20 Thread Ethan Gruber
Wordpress?


On Wed, Feb 20, 2013 at 11:42 AM, Karen Coyle li...@kcoyle.net wrote:

 Shaun, you cannot decide whether github is a barrier to entry FOR ME (or
 anyone else), any more than you can decide whether or not my foot hurts.
 I'm telling you github is NOT what I want to use. Period.

 I'm actually thinking that a blog format would be nice. It could be pretty
 (poetry and beauty go together). Poems tend to be short, so they'd make a
 nice blog post. They could appear in the Planet blog roll. They could be
 coded by author and topic. There could be comments! Even poems as comments!
 The only down-side is managing users. Anyone have ideas on that?

 kc



 On 2/20/13 8:20 AM, Shaun Ellis wrote:

  (As a general rule, for every programmer who prefers tool A, and says
  that everybody should use it, there’s a programmer who disparages tool
  A, and advocates tool B. So take what we say with a grain of salt!)

 It doesn't matter what tools you use, as long as you and your team are
 able to participate easily, if you want to.  But if you want to attract
  contributions from a given development community, then choices should be
 balanced between the preferences of that community and what best serve the
 project.

 From what I've been hearing, I think there is a lot of confusion about
 GitHub.  Heck, I am constantly learning about new GitHub features, APIs,
 and best practices myself. But I find it to be an incredibly powerful
 platform for moving open source, distributed software development forward.
  I am not telling anyone to use GitHub if they don't want to, but I want to
 dispel a few myths I've heard recently:

 

 * Myth #1 : GitHub creates a barrier to entry.
 * To contribute to a project on GitHub, you need to use the
 command-line. It's not for non-coders.

 GitHub != git.  While GitHub was initially built for publishing and
 sharing code via integration with git, all GitHub functionality can be
 performed directly through the web gui.  In fact, GitHub can even be used
 as your sole coding environment. There are other tools in the eco-system
 that allow non-coders to contribute documentation, issue reporting, and
 more to a project.

 

 * Myth #2 : GitHub is for sharing/publishing code.
 * I would be fun to have a wiki for more durable poetry (github
 unfortunately would be a barrier to many).

 GitHub can be used to collaborate on and publish other types of content
 as well.  For example, GitHub has a great wiki component* (as well as a
 website component).  In a number of ways, has less of a barrier to entry
 than our Code4Lib wiki.

 While the path of least resistance requires a repository to have a
 wiki, public repos cost nothing and can consist of a simple README file.
  The wiki can be locked down to a team, or it can be writable by anyone
 with a github account.  You don't need to do anything via command-line,
 don't need to understand git-flow, and you don't even need to learn wiki
 markup to write content. All you need is an account and something to say,
 just like any wiki. Log in, go to the anti-harassment policy wiki, and see
 for yourself:
 https://github.com/code4lib/**antiharassment-policy/wikihttps://github.com/code4lib/antiharassment-policy/wiki

 * The github wiki even has an API (via Gollum) that you can use to
 retrieve raw or formatted wiki content, write new content, and collect
 various meta data about the wiki as a whole:
 https://github.com/code4lib/**antiharassment-policy/wiki/_**accesshttps://github.com/code4lib/antiharassment-policy/wiki/_access

 

 * Myth #3 : GitHub is person-centric.
  (And as a further aside, there’s plenty to dislike about github as
  well, from it’s person-centric view of projects (rather than
  team-centric)...

 Untrue. GitHub is very team centered when using organizational accounts,
 which formalize authorization controls for projects, among other things:
 https://github.com/blog/674-**introducing-organizationshttps://github.com/blog/674-introducing-organizations

 

 * Myth #4 : GitHub is monopolizing open source software development.
  ... to its unfortunate centralizing of so much free/open
  source software on one platform.)

 Convergence is not always a bad thing. GitHub provides a great, free
 service with lots of helpful collaboration tools beyond version control.
  It's natural that people would flock there, despite having lots of other
 options.

 

 -Shaun







 On 2/19/13 5:35 PM, Erik Hetzner wrote:

 At Sat, 16 Feb 2013 06:42:04 -0800,
 Karen Coyle wrote:


 gitHub may have excellent startup documentation, but that startup
 documentation describes git in programming terms mainly using *nx
 commands. If you have never had to use a version control system (e.g. if
 you do not write code, especially in a shared environment), clone
 push pull are very poorly described. The documentation is all in
 terms of *nx commands. Honestly, anything where this is in the
 

[CODE4LIB] Fuseki and other SPARQL servers

2013-02-20 Thread Ethan Gruber
Hi all,

I have been playing around with Fuseki (
http://jena.apache.org/documentation/serving_data/index.html) for a few
months to get my feet wet with accessing and querying RDF.  I quite like
it. I find it well documented and easy to set up.  We will soon deploy a
SPARQL server in a production environment, and I would like to know if
others on the list have experience with Fuseki in production, or have other
recommendations.  Mulgara is off the table as it inexplicably conflicts
with other apps installed in Tomcat.

Thanks,
Ethan


Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-20 Thread Ethan Gruber
TDB as per the startup instruction: fuseki-server --loc=DB
/DatasetPathName

Ethan


On Wed, Feb 20, 2013 at 3:02 PM, Ross Singer rossfsin...@gmail.com wrote:

 On Feb 20, 2013, at 2:52 PM, Ethan Gruber ewg4x...@gmail.com wrote:

  Hi Hugh,
 
  I have investigated the possibility of deploying Fuseki as a war in
 Tomcat (
  https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
 how
  the default Jetty container would respond in production, but since you
  aren't having any problems with that deployment, I may go ahead and do
 that.

 Fuseki/Jetty will have no problems scaling, it's what the Talis Platform
 used for large datasets.  I also ran a large dataset for quite a while with
 it.

 Which backend are you using?  TDB?  SDB?

 -Ross.

 
  Ethan
 
 
  On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless philomou...@gmail.com
 wrote:
 
  Hi Ethan!
 
  We've been using Jena/Fuseki in papyri.info for about a year now, iirc.
  We started with Mulgara, but switched. It's running in its own Jetty
  container in our system, but I've had no performance issues with it
  whatever.
 
  Best,
  Hugh
 
  On Feb 20, 2013, at 14:31 , Ethan Gruber ewg4x...@gmail.com wrote:
 
  Hi all,
 
  I have been playing around with Fuseki (
  http://jena.apache.org/documentation/serving_data/index.html) for a
 few
  months to get my feet wet with accessing and querying RDF.  I quite
 like
  it. I find it well documented and easy to set up.  We will soon deploy
 a
  SPARQL server in a production environment, and I would like to know if
  others on the list have experience with Fuseki in production, or have
  other
  recommendations.  Mulgara is off the table as it inexplicably conflicts
  with other apps installed in Tomcat.
 
  Thanks,
  Ethan
 



Re: [CODE4LIB] Getting started with Ruby and library-ish data (was RE: [CODE4LIB] You *are* a coder. So what am I?)

2013-02-18 Thread Ethan Gruber
The language you choose is somewhat dependent on the data you're working
with.  I don't find that Ruby or PHP are particularly good at dealing with
XML. They're passable for data manipulation and migration, but I wouldn't
use them to render large collections of structured XML data, like EAD or
TEI collections, or whatever.


Ethan


On Mon, Feb 18, 2013 at 8:52 AM, Jason Stirnaman jstirna...@kumc.eduwrote:

 This is a terribly distorted view of Ruby: If you want to make web pages,
 learn Ruby, and you don't need to learn Rails to get the benefit of Ruby's
 awesomeness. But, everyone will have their own opinions. There's no
 accounting for taste.

 For anyone interested in learning to program and hack around with library
 data or linked data, here are some places to start (heavily biased toward
 the elegance of Ruby):

 http://wiki.code4lib.org/index.php/Working_with_MaRC
 https://delicious.com/jstirnaman/ruby+books
 https://delicious.com/jstirnaman/ruby+tutorials
 http://rdf.rubyforge.org/

 Jason

 Jason Stirnaman
 Digital Projects Librarian
 A.R. Dykes Library
 University of Kansas Medical Center
 913-588-7319

 
 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Joe
 Hourcle [onei...@grace.nascom.nasa.gov]
 Sent: Sunday, February 17, 2013 12:52 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] You *are* a coder. So what am I?

 On Feb 17, 2013, at 11:43 AM, John Fereira wrote:

  I have been writing software professionally since around 1980 and
 first encounterd perl in the early 1990s of so and have *always* disliked
 it.   Last year I had to work on a project that was mostly developed in
 perl and it reminded me how much I disliked it.  As a utility language, and
 one that I think is good for beginning programmers (especially for those
 working in a library) I'd recommend PHP over perl every time.

 I'll agree that there are a few aspects of Perl that can be confusing, as
 some functions will change behavior depending on context, and there was a
 lot of bad code examples out there.*

 ... but I'd recommend almost any current mainstream language before
 recommending that someone learn PHP.

 If you're looking to make web pages, learn Ruby.

 If you're doing data cleanup, Perl if it's lots of text, Python if it's
 mostly numbers.

 I should also mention that in the early 1990s would have been Perl 4 ...
 and unfortunately, most people who learned Perl never learned Perl 5.  It's
 changed a lot over the years.  (just like PHP isn't nearly as insecure as
 it used to be ... and actually supports placeholders so you don't end up
 with SQL injections)

 -Joe



Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Ethan Gruber
Google is more useful than any reference book to find answers to
programming problems.
On Nov 1, 2012 4:25 PM, Bohyun Kim k...@fiu.edu wrote:

 Hi all code4lib-bers,

 As coders and coding librarians, what is ONE tool and/or resource that you
 recommend to newbie coders in a library (and why)?  I promise I will create
 and circulate the list and make it into a Code4Lib wiki page for collective
 wisdom.  =)

 Thanks in advance!
 Bohyun

 ---
 Bohyun Kim, MA, MSLIS
 Digital Access Librarian
 bohyun@fiu.edu
 305-348-1471
 Medical Library, College of Medicine
 Florida International University
 http://medlib.fiu.edu
 http://medlib.fiu.edu/m (Mobile)



[CODE4LIB] Using dbpedia to generate EAC-CPF collections

2012-10-03 Thread Ethan Gruber
Hi all,

In the last few weeks, I have undertaken a project of EAC-CPF stubs using
dbpedia and VIAF data for the Roman emperors and their relations.  There's
a lot of great information available through dbpedia, and since it's
available in RDF, I put together a PHP script that can start at one point
in dbpedia (e.g., http://dbpedia.org/resource/Augustus) and traverse
through its relations to create a network of stubs using links to parents,
children, spouses, influences, successors, and predecessors provided in the
RDF.  Left unchecked, the script would crawl forward through the Byzantine
period to spread laterally (chronologically speaking) to generate a network
of the ruling hierarchy of the West up to the modern period.  It also goes
backwards to the successors of Alexander the Great.  For all I know, it
goes back through all of the Egyptian dynasties to Narmer ca. 3000 BC, but
I haven't let the script go that far.

The script is fairly generalizable, and can begin at any dbpedia resource.
It's available at
https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php

I should also note that this is a work in progress.  To execute the script,
you'll need to place a temp folder in the same place you download/execute
it (for writing EAC records).

At a glance, here's what it does:

-Creates nameEntries for all of the names available in various languages in
dbpedia
-If a VIAF ID is available in the RDF, the script will pull some alternate
record IDs from VIAF, as well as birth and death dates
-Can pull in subjects, occupations, and related resources on the web
-Generate corporate/personal/family relations given the
parents/children/spouses/influences/successors/predecessors/dynasties
linked in dbpedia.  These relations are added into an array which
continually processes until presumably it reaches the end of time.
-You can specify an end record to attempt to break this chain, but I
cannot guarantee that it'll work.  Anastasius (emperor of Rome ca. 500 AD)
does actually successfully terminate the Augustus chain.
-Import birth and death places (and associated birth and death dates, if
available)

I think that these stubs are a good starting point for handing off the
management of EAC content to subject specialists who can add chronological
and geographical context.  I wrote a bit more about this script and the
process applied to xEAC, an XForms-based engine for creating, editing,
managing, and publishing EAC-CPF collections at
http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html

There's a prototype collection of the Roman Empire; if anyone is interested
in taking a look at it, drop me a line off the list.

Ethan


Re: [CODE4LIB] Displaying TGN terms

2012-09-17 Thread Ethan Gruber
I use Geonames for this sort of thing a lot.  With cities and
administrative divisions being offered in a machine-readable format, it's
pretty easy to encode places in a format that adheres to AACR2 or other
cataloging rules.  There are of course problems disambiguating city names
when no country is given, but I get a pretty accurate response in general:
probably greater than 76% when I have both the city and country or city and
geographic region.

Ethan

On Mon, Sep 17, 2012 at 3:16 PM, Eric Lease Morgan emor...@nd.edu wrote:

 On Sep 17, 2012, at 3:12 PM, ddwigg...@historicnewengland.org wrote:

  But I'm having trouble coming up with an algorithm that can consistently
 spit these out in the form we'd want to display given the data available in
 TGN.


 A dense but rich, just-published article from D-Lib Magazine about
 geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text
 Archives -- may give some guidance. From the conclusion:

  Spatial information is playing an increasing role in the access
  and mediation of information, driving interest in methods capable
  of extracting spatial information from the textual contents of
  large document archives. Automated approaches, even using fairly
  basic algorithms, can achieve upwards of 76% accuracy when
  recognizing, disambiguating, and converting to mappable
  coordinates the references to individual cities and landmarks
  buried deep within the text of a document. The workflow of a
  typical geocoding system involves identifying potential
  candidates from the text, checking those candidates for potential
  matches in a gazetteer, and disambiguating and confirming those
  candidates -- http://bit.ly/Ufl5k9

 --
 ELM



Re: [CODE4LIB] Timelines (was: visualize website)

2012-08-31 Thread Ethan Gruber
There's also timemap (SIMILE Timeline + mapping libraries like Google Maps
or OpenLayers) if you need to display geography in conjunction to
chronology.  http://code.google.com/p/timemap/

Ethan

On Fri, Aug 31, 2012 at 9:27 AM, Walter Lewis wltrle...@gmail.com wrote:

 On 2012-08-30, at 1:03 PM, miles stauffer wrote:

  Is this what you are looking for?
  http://selection.datavisualization.ch/

 The site points to TimelineJS at http://timeline.verite.co/ for timeline
 visualization.
 There is also the widget from the SIMILE project at MIT at
 http://www.simile-widgets.org/timeline/

 Are there other suggestions for tools for time line visualizations?

 Walter



Re: [CODE4LIB] Archival Software

2012-08-09 Thread Ethan Gruber
I find Omeka to be stronger in the area of collections publication and
exhibition than hardcore archival management due to the rather rudimentary
Dublin Core metadata foundation.  You can make other element sets, but it's
not a perfect solution.

Ethan

On Thu, Aug 9, 2012 at 2:57 PM, Kaile Zhu kz...@uco.edu wrote:

 How about Omeka?  Need to consider the library standards because
 eventually you will have to make your archival collection searchable.  -
 Kelly

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Lisa Gonzalez
 Sent: Thursday, August 09, 2012 1:38 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Archival Software

 Related to the CLIR Report, the wiki version is a little easier to
 navigate:

 http://archivalsoftware.pbworks.com/w/page/13600254/FrontPage


 Lisa Gonzalez
 Electronic Resources Librarian
 Catholic Theological Union
 5401 S. Cornell Ave.
 Chicago, IL 60615
 773-371-5463
 lgonza...@ctu.edu






 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Nathan Tallman
 Sent: Thursday, August 09, 2012 12:00 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Archival Software

 As an archivist, this is still a very broad response.

 Are you looking to manage archival collections (accessioning, arrangement
 and description, producing finding aids, etc.)? If so, Archivists Toolkit
 or Archon may work for you. I'm not sure what you mean by university
 historical information, perhaps ready-reference type guides?
 There are a plethora of web options for this. Are you looking to manage
 digital assets? Then a digital repository, such as Fedora or Dspace is in
 order.

 Although it's a bit out of date at this point, you may want to look at
 Lisa Spiro's 2009 report, Archival Management Software 
 http://www.clir.org/pubs/reports/spiro/. Also, check out Carol Bean's
 blog, BeanWorks. She has a post about comparing digital asset managers 
 http://beanworks.clbean.com/2010/05/creating-a-comparison-matrix/ (and
 also has useful related links).

 Best,
 Nathan

 On Thu, Aug 9, 2012 at 10:42 AM, Joselito Dela Cruz
 jdelac...@hodges.eduwrote:

  We are looking to centralize the university historical information and
  archives.
 
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of Matthew Sherman
  Sent: Thursday, August 09, 2012 10:38 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Archival Software
 
  I think you need to provide a little more context as to what you are
  trying to do.  The trouble is that the term archive is used in a
  variety of different ways right now so we need to know what you mean
  to be able to give you the best suggestions.
 
  On Thu, Aug 9, 2012 at 9:31 AM, Joselito Dela Cruz
  jdelac...@hodges.eduwrote:
 
   Any suggestions for inexpensive  easy to use archival software?
  
   Thanks,
  
   Jay Dela Cruz, MLIS
   Electronic Resources Librarian
   Hodges University | 2655 Northbrooke Drive, Naples, FL 34119-7932
   (239) 598-6211 | (800) 466-8017 x 6211 | f. (239) 598-6250
   jdelac...@hodges.edu | www.hodges.edu
  
 


 **Bronze+Blue=Green** The University of Central Oklahoma is Bronze, Blue,
 and Green! Please print this e-mail only if absolutely necessary!

 **CONFIDENTIALITY** This e-mail (including any attachments) may contain
 confidential, proprietary and privileged information. Any unauthorized
 disclosure or use of this information is prohibited.



[CODE4LIB] Reminder: THATCamp for Computational Archaeology registration deadline is TODAY

2012-06-10 Thread Ethan Gruber
Today, June 10 is the final day to register for THATCamp CAA-NA, an
unconference for computer applications in archaeology.  The free event will
be held Friday, August 10 in the Harrison-Small Special Collections Library
of the University of Virginia, Charlottesville.  It is sponsored by the
Computer Applications and Quantitative Methods in Archaeology - North
America chapter, the University of Virginia Library *Year of Metadata *and
the Fiske Kimball Fine Arts Library.  This is a great opportunity to
interact with archaeologists, students, museum and library professionals,
and computer and information scientists operating within cultural heritage!

The general themes of the event are as follows:


   1. Simulating the Past
   2. Spatial Analysis
   3. Data Modelling  Sharing
   4. Data Analysis, Management, Integration  Visualisation
   5. Geospatial Technologies
   6. Field  Lab Recording
   7. Theoretical Approaches  Context of Archaeological Computing
   8. Human Computer Interaction, Multimedia, Museums

More info: http://caana2012.thatcamp.org/

Follow us on twitter at @THATCampCAANA or for email inquiries, use
thatcampca...@gmail.com

Ethan Gruber
American Numismatic Society


Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Ethan Gruber
Saxon is really, really efficient with large files.  I don't really have
any benchmarks stats available, but I have gotten noticeably better
performance from Saxon/XSLT2 than PHP with DOMDocument or SimpleXML or
nokogiri and hpricot in Ruby.

Ethan

On Fri, Jun 8, 2012 at 2:36 PM, Kyle Banerjee baner...@orbiscascade.orgwrote:

 I'm working on a script that needs to be able to crosswalk at least a
 couple hundred XML files regularly, some of which are quite large.

 I've thought of a number of ways to go about this, but I wanted to bounce
 this off the list since I'm sure people here deal with this problem all the
 time. My goal is to make something that's easy to read/maintain without
 pegging the CPU and consuming too much memory.

 The performance and load I'm seeing from running the files through LibXML
 and SimpleXML on the large files is completely unacceptable. SAX is not out
 of the question, but I'm trying to avoid it if possible to keep the code
 more compact and easier to read.

 I'm tempted to streamedit out all line breaks since they occur in
 unpredictable places and put new ones at the end of each record into a temp
 file. Then I can read the temp file one line at a time and process using
 SimpleXML. That way, there's no need to load giant files into memory,
 create huge arrays, etc and the code would be easy enough for a 6th grader
 to follow. My proposed method doesn't sound very efficient to me, but it
 should consume predictable resources which don't increase with file size.

 How do you guys deal with large XML files? Thanks,

 kyle

 rantWhy the heck does the XML spec require a root element,
 particularly since large files usually consist of a large number of
 records/documents? This makes it absolutely impossible to process a file of
 any size without resorting to SAX or string parsing -- which takes away
 many of the advantages you'd normally have with an XML structure. /rant

 --
 --
 Kyle Banerjee
 Digital Services Program Manager
 Orbis Cascade Alliance
 baner...@uoregon.edubaner...@orbiscascade.org / 503.999.9787



Re: [CODE4LIB] Studying the email list (Charcuterie Spectrum)

2012-06-05 Thread Ethan Gruber
The begs the question, what is the official Roy Tennant position on baloney
vs. bologna?  May I suggest a viaf-like resource for food, in which I may
prefer the baloney label while allowing my data to be cross-searchable with
bologna records?  Is there an RDF ontology for this???

On Tue, Jun 5, 2012 at 4:02 PM, Kevin S. Clarke kscla...@gmail.com wrote:

 On Tue, Jun 5, 2012 at 3:55 PM, BWS Johnson abesottedphoe...@yahoo.com
 wrote:

Bacon   == Seal of Approval
Bologna == Seal of Disapproval
Salami  == Seal of No Approval Needed
 
 
  This has some serious flaws. I'm concerned about the relationships
 between the desirability of the bespoke seals as they relate to the appeal
 of the meats themselves. While yea, bacon is nearly universal in its
 appeal, that one seems on the mark. Alas, bologna as the seal of
 disapproval might fall a bit short. While one might jump to proffer spam in
 its place, Hawai'ians quite like spam, leaving us all in a bit of a
 quandry. Olive loaf, perhaps? And while salame is a most excellent meat,
 perhaps fois gras more aptly conveys the aboutness of not giving a damn
 about one's approval or lack thereof.
 
   What say you cataloguing mafia? Surely we must honour the aboutness
 of meat and approval lest we needs OCLC to intervene more often than is
 strictly necessary in our mortal affairs.

 I'm vegan now, but having eaten it as a child, may I suggest chicken
 livers for the Seal of Disapproval? Blech!  And, as a vegan, I'd
 stretch bounds of the Seal of No Approval Needed to tempeh.  That
 seems appropriate.

 Fwiw...
 Kevin



[CODE4LIB] THATCamp for Computational Archaeology registration extended to June 10

2012-06-04 Thread Ethan Gruber
Dear all,

The registration deadline for THATCamp for Computational Archaeology has
been extended to June 10.  Registration is free and first-come, first
serve.  The THATCamp will be hosted August 10 at the University of Virginia
in Charlottesville.  It is co-sponsored by the Computer Applications and
Quantitative Methods in Archaeology - North America chapter and the U. Va.
Libraries Year of Metadata/Fiske Kimball Fine Arts Library.  For more
information, check out: http://caana2012.thatcamp.org/about-thatcampcaa-na/

This will be a great opportunity to meet new people and share ideas within
the realms of archaeology and technology.  You can follow us on twitter at
@THATCampCAANA.  Look forward to seeing you there!

Ethan Gruber
American Numismatic Society


Re: [CODE4LIB] triple stores ???

2012-05-29 Thread Ethan Gruber
For those using these big triplestores, how are you putting data in?  I'm
looking for a triplestore which supports SPARQL update.  Any comments
anyone can add on this interface will be useful.

Ethan
On May 29, 2012 4:12 PM, Ravi Shankar rshan...@stanford.edu wrote:

 Thanks, Stefano. The Europeana report seems to be quite comprehensive. It
 is funny that I've searched earlier for triple store comparisons with more
 explicit parameters 'rdf triple store comparison', and the Europeana report
 appeared in the third page of the search results. The 'triple' in the
 search seems to be the culprit -- a clear need for more semantics in the
 search engine ;)

 Cheers,
 Ravi

 On May 29, 2012, at 1:01 AM, Stefano Bargioni wrote:

  Maybe a G search can help to find comparisons:
 
 http://www.google.com/search?sugexp=chrome,mod=4sourceid=chromeie=UTF-8q=4store+Virtuoso+Jena+SDB++Mulgara
  The result includes your post... added 8 minutes ago.
  Stefano
 
  On 29/mag/2012, at 09.12, Ravi Shankar wrote:
 
  We (DLSS at Stanford Libraries) are planning to use a triple store for
 storing and retrieving annotations (in RDF) on digital objects. We are
 currently looking at open-source triple stores such as 4store, Virtuoso,
 Jena SDB and Mulgara. Are you currently using a triple store or
 contemplating on using one? How would you evaluate 'your' triple store
 along the lines of 1) ease of setup, 2) scalability, 3) query performance,
 3) bulk load performance, 4) access api, 5) documentation and 6) community
 support?
 
  Highly appreciate your thoughts, ideas and suggestions.
 
  Thanks,
  Ravi Shankar
 
 
 
  __
  Il tuo 5x1000 al Patronato di San Girolamo della Carita' e' un gesto
 semplice ma di grande valore.
  Una tua firma aiutera' i sacerdoti ad essere piu' vicini alle esigenze
 di tutti noi.
  Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti
 indicando nella dichiarazione dei redditi il codice fiscale 97023980580.



Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber
Thanks.  I have been working on a system that allows editing of RDF in web
forms, creating linked data connections in the background, publishing to
eXist and Solr for dissemination, and will eventually integrate operation
with an RDF triplestore/SPARQL, all with Tomcat apps.  I'm not sure it is
possible to create, manage, and deliver our content with node.js, but I was
told by the project manager that Apache, Java, and Tomcat were showing
signs of age.  I'm not so sure about this considering the prevalence of
Tomcat apps both in libraries and industry.  I happen to be very fond of
Solr, and it seems very risky to start over in node.js, especially since I
can't be certain the end product will succeed.  I prefer to err on the side
of stability.

If anyone has other thoughts about the future of Tomcat applications in the
library, or more broadly cultural heritage informatics, feel free to jump
in.  Our data is exclusively XML, so LAMP/Rails aren't really options.

Ethan

On Tue, May 8, 2012 at 10:03 AM, Nate Vack njv...@wisc.edu wrote:

 On Mon, May 7, 2012 at 10:17 PM, Ethan Gruber ewg4x...@gmail.com wrote:

  It was recently suggested to me that a project I am working on may adopt
  node.js for its architecture (well, be completely re-written for
 node.js).
  I don't know anything about node.js, and have only heard of it in some
  passing discussions on the list.  I'd like to know if anyone on code4lib
  has experience developing in this platform, and what their thoughts are
 on
  it, positive or negative.

 I've only played a little bit, but my take is: you'll have more parts
 to build than with other systems. If you need persistent connections,
 it's gonna be neat; if you don't, it's probably not worth the bother.

 The Peepcode screencasts on Node:

 https://peepcode.com/screencasts/node

 are probably worth your time and money.

 -n



Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber
Thanks, it really helps to get a list of projects using it so I can get a
better sense of what's possible.

On Tue, May 8, 2012 at 10:23 AM, Cary Gordon listu...@chillco.com wrote:

 I have done some work with node building apps in the areas of mapping
 and communication (chat, etc.).

 Looking at the list at

 https://github.com/joyent/node/wiki/Projects,-Applications,-and-Companies-Using-Node
 ,
 the emphasis on real-time stands out.

 Node is fast and lightweight, and is well suited to applications that
 need speed and can take advantage of multiple channels.

 Thanks,

 Cary

 On Mon, May 7, 2012 at 8:17 PM, Ethan Gruber ewg4x...@gmail.com wrote:
  Hi all,
 
  It was recently suggested to me that a project I am working on may adopt
  node.js for its architecture (well, be completely re-written for
 node.js).
  I don't know anything about node.js, and have only heard of it in some
  passing discussions on the list.  I'd like to know if anyone on code4lib
  has experience developing in this platform, and what their thoughts are
 on
  it, positive or negative.
 
  Thanks,
  Ethan



 --
 Cary Gordon
 The Cherry Hill Company
 http://chillco.com



Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber
For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,
but I feel like I'm missing some sort of inside joke here.

Thanks for the info.  To clarify, I don't develop in java, but deploy
well-established java-based apps in Tomcat, like Solr and eXist (and am
looking into a java triplestore to run in Tomcat) and write scripts to make
these web services interact in whichever language seems to be the most
appropriate.  Node looks like it may be interesting to play around with,
but I'm wary of having to learn something completely new, jettisoning every
application and language I am experienced with, to put a new project into
production in the next 4-8 weeks.

Ethan

On Tue, May 8, 2012 at 1:15 PM, Nate Vack njv...@wisc.edu wrote:

 On Tue, May 8, 2012 at 11:45 AM, Ross Singer rossfsin...@gmail.com
 wrote:
  On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:
 
  in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
 
  ^^ Really?  Nobody's going to take the bait with this one?

 I can't see why they would; parsing XML in ruby is simply not possible.

 ;-)

 -n



Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber
I once had benchmarks comparing XML processing with Saxon/XSLT2 vs hpricot
and nokogiri, and Saxon is the most efficient XML processor there is.  I
don't have that data any more though, but that's why I'm not a proponent of
using PHP/Ruby for delivering and manipulating XML content.  Each platform
has its pros and cons.  I didn't mean to ruffle any feathers with that
statement.

On Tue, May 8, 2012 at 2:18 PM, Ross Singer rossfsin...@gmail.com wrote:

 On May 8, 2012, at 2:01 PM, Ethan Gruber wrote:

  For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,

 So then explain why LAMP/Rails aren't really options.

 It's hard to see how anybody can recommend node.js (or any other stack)
 based on this statement because without knowing _why_ these are inadequate.
  My guess is that node's XML libraries are also libXML based, just like
 pretty much any other C-based language.

  but I feel like I'm missing some sort of inside joke here.
 
  Thanks for the info.  To clarify, I don't develop in java, but deploy
  well-established java-based apps in Tomcat, like Solr and eXist (and am
  looking into a java triplestore to run in Tomcat) and write scripts to
 make
  these web services interact in whichever language seems to be the most
  appropriate.  Node looks like it may be interesting to play around with,
  but I'm wary of having to learn something completely new, jettisoning
 every
  application and language I am experienced with, to put a new project into
  production in the next 4-8 weeks.

 Eh, if your window is 4-8 weeks, then I wouldn't be considering node for
 this project.  It does, however, sound like you could really use a new
 project manager, because the one you have sounds terrible.

 -Ross.

 
  Ethan
 
  On Tue, May 8, 2012 at 1:15 PM, Nate Vack njv...@wisc.edu wrote:
 
  On Tue, May 8, 2012 at 11:45 AM, Ross Singer rossfsin...@gmail.com
  wrote:
  On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:
 
  in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
 
  ^^ Really?  Nobody's going to take the bait with this one?
 
  I can't see why they would; parsing XML in ruby is simply not possible.
 
  ;-)
 
  -n
 



Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber
The 4-8 week deadline is more self-imposed than anything.  The plan is (or
was) to deploy the new version of this project by mid-late summer.  It is
already under way, with a working prototype, and I can probably mostly
finish it in 80-120 hours of solid work.  I want to deploy it as soon as we
can because other bigger, sexier projects depend on RDF delivered from this
project.  If it takes six months to completely rewrite this project for
node, or any non-java platform with which I have less experience, we've
thrown a monkey wrench into the development of our other projects.

As for triplestores:

Mulgara is on my list to check out, as is sesame.  Does mulgara support
SPARQL Update yet?  In theory, one should be able to post updates directly
from XForms into a triplestore which supports SPARQL Update.  Maybe this
warrants a separate thread.

On Tue, May 8, 2012 at 3:39 PM, Kevin Ford k...@3windmills.com wrote:

  (and am
  looking into a java triplestore to run in Tomcat)
 -- I don't know if the parenthetical was simply a statement or a
 solicitation - apologies if it was the former.

 Take a look at Mulgara.  Drops right into Tomcat.

 http://mulgara.org/

 --Kevin




 On 05/08/2012 02:01 PM, Ethan Gruber wrote:

 For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,
 but I feel like I'm missing some sort of inside joke here.

 Thanks for the info.  To clarify, I don't develop in java, but deploy
 well-established java-based apps in Tomcat, like Solr and eXist (and am
 looking into a java triplestore to run in Tomcat) and write scripts to
 make
 these web services interact in whichever language seems to be the most
 appropriate.  Node looks like it may be interesting to play around with,
 but I'm wary of having to learn something completely new, jettisoning
 every
 application and language I am experienced with, to put a new project into
 production in the next 4-8 weeks.

 Ethan

 On Tue, May 8, 2012 at 1:15 PM, Nate Vacknjv...@wisc.edu  wrote:

  On Tue, May 8, 2012 at 11:45 AM, Ross Singerrossfsin...@gmail.com
 wrote:

 On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:


 in.  Our data is exclusively XML, so LAMP/Rails aren't really options.


 ^^ Really?  Nobody's going to take the bait with this one?


 I can't see why they would; parsing XML in ruby is simply not possible.

 ;-)

 -n




[CODE4LIB] Anyone using node.js?

2012-05-07 Thread Ethan Gruber
Hi all,

It was recently suggested to me that a project I am working on may adopt
node.js for its architecture (well, be completely re-written for node.js).
I don't know anything about node.js, and have only heard of it in some
passing discussions on the list.  I'd like to know if anyone on code4lib
has experience developing in this platform, and what their thoughts are on
it, positive or negative.

Thanks,
Ethan


Re: [CODE4LIB] Omeka and CoSign

2012-04-20 Thread Ethan Gruber
Hi Ken,

You may get a response here, but the Omeka Google Group community offers
really great support.  I'd ask there as well.

Ethan

On Fri, Apr 20, 2012 at 12:30 PM, Varnum, Ken var...@umich.edu wrote:

 We're hoping to use our campus CoSign authentication system with Omeka,
 allowing campus users to log in with our campus single sign-on and (where
 appropriate permissions have been granted to that user ID in Omeka) getting
 the user to the admin pages, bypassing the Omeka login screen. Has anyone
 done this? If so, could you lend us some advice (or code)?

 Ken


 --
 Ken Varnum
 Web Systems Manager   E: var...@umich.edu
 University of Michigan LibraryT: 734-615-3287
 300C Hatcher Graduate Library F: 734-647-6897
 Ann Arbor, MI 48109-1190
 http://www.lib.umich.edu/users/varnum



[CODE4LIB] Representing geographic hiearchy in linked data

2012-04-18 Thread Ethan Gruber
 No Message Collected 


Re: [CODE4LIB] Author authority records to create publication feed?

2012-04-13 Thread Ethan Gruber
It appears that academia.edu still does not have an Atom/RSS feed for
member activity and listed publications, but I think such a feature would
be very useful.  If there was a concerted effort to demand such a service,
academia.edu might consider implementing it.

Ethan

On Fri, Apr 13, 2012 at 9:25 AM, Paul Butler (pbutler3) pbutl...@umw.eduwrote:

 Howdy All,

 Some folks from across campus just came to my door with this question.  I
 am still trying to work through the possibilities and problems, but thought
 others might have encountered something similar.

 They are looking for a way to create a feed (RSS, or anything else that
 might work) for each faculty member on campus to collect and link to their
 publications, which can then be embedded into their faculty profile webpage
 (in WordPress).

 I realize the vendors (JSTOR, EBSCO, etc.) allow author RSS feeds, but
 that really does not allow for disambiguation between folks with the same
 name and variants in name citation.  It appears Web of Science has author
 authority records and a set of apis, but we currently do not subscribe to
 WoS and am waiting for a trial to test.  What we need is something similar
 to this: http://arxiv.org/help/author_identifiers

 We can ask faculty members to upload their own citations and then just
 auto link out to something like Serials Solutions' Journal Finder,  but
 that is likely not sustainable.

 So, any suggestions - particularly free or low cost solutions.  Thanks!

 Cheers, Paul
 +-+-+-+-+-+-+-+-+-+-+-+-+
 Paul R Butler
 Assistant Systems Librarian
 Simpson Library
 University of Mary Washington
 1801 College Avenue
 Fredericksburg, VA 22401
 540.654.1756
 libraries.umw.edu

 Sent from the mighty Dell Vostro 230.



Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-11 Thread Ethan Gruber
Thanks to everyone for the suggestions.

Ethan

On Tue, Apr 10, 2012 at 7:43 PM, Simon Spero sesunc...@gmail.com wrote:

 On Mon, Apr 9, 2012 at 7:13 PM, Ethan Gruber ewg4x...@gmail.com wrote:

  Ancient geographic entities.  Athens is in Attica.  Sardis is in Lydia
 (in
  Anatolia, for example).  If these were modern geopolitical entities, I
  would use geonames.  We're linking cities to Pleiades, but Pleiades does
  not maintain parent::child geographic relationships.


 geoPoliticalSubdivision may work for you. You could assert this as a
 subPropertyOf ObjectInverseOf(partOf), since BIG is a
 holonymhttp://en.wikipedia.org/wiki/Holonymyof SMALL. Also, it is
 probably a bad idea to use partOf if there is a more
 specific sub-property that you can use that will better capture the
 intended  meaning - for example, components of a kit, or series in a fonds.

 http://sw.opencyc.org/concept/Mx4rvfGaTZwpEbGdrcN5Y29ycA

 (geopoliticalSubdivision BIG SMALL) means that
 the GeopoliticalEntity SMALL is a part of the
 larger GeopoliticalEntity BIG. The territory (see the constant TerritoryFn)
 of SMALL is a geographical sub-region (see the
 predicate geographicalSubRegions) of the territory of BIG. The government
 (see the constant GovernmentFn) of BIG usually has some sovereignty over
 the government of SMALL.

 Simon



Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-09 Thread Ethan Gruber
Ancient geographic entities.  Athens is in Attica.  Sardis is in Lydia (in
Anatolia, for example).  If these were modern geopolitical entities, I
would use geonames.  We're linking cities to Pleiades, but Pleiades does
not maintain parent::child geographic relationships.

Ethan
On Apr 9, 2012 5:53 PM, Simon Spero sesunc...@gmail.com wrote:

 Are you talking about geographical entities, or geopolitical ones? For
 example,  is there an answer to the question what country is
 constantinople located in?

 Simon
 On Apr 8, 2012 8:02 PM, Ethan Gruber ewg4x...@gmail.com wrote:

  CIDOC-CRM may be the answer here. I will look over the documentation in
  greater detail tomorrow.
 
  Thanks,
  Ethan
  On Apr 8, 2012 7:56 PM, Ethan Gruber ewg4x...@gmail.com wrote:
 
   The data is modeled, but I want to use an ontology for geographic
  concepts
   that already exists, if possible.  If anything, my issue highlights the
   point that linked data can be *too* flexible.
   On Apr 8, 2012 3:54 PM, Michael Hopwood mich...@editeur.org wrote:
  
   I think this highlights the point that, at some point, you have to
 model
   the data.
  
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
   Ethan Gruber
   Sent: 08 April 2012 15:44
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked
 data
  
   Hi,
  
   Thanks for the info, but it's not quite what I'm looking for.  We've
   established authority control for ancient places, but I'm looking for
 an
   ontology I can use to describe the child:parent relationship between
  city
   and region or region and larger region (in any way that isn't
   dcterms:partOf).  Geonames has defined their own vocabulary that can't
   really be reused in other geographic contexts, e.g. with
 gn:countryCode,
   gn:parentCountry.
  
   Thanks,
   Ethan
  
   On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle li...@kcoyle.net
 wrote:
  
Also, there is Geonames (http://www.geonames.org), which is the
primary geographic data set on the Semantic Web. Here is the link to
   Athens:
   
http://www.geonames.org/**search.html?q=athenscountry=**GR
  http://www
.geonames.org/search.html?q=athenscountry=GR
   
kc
   
   
On 4/6/12 4:54 PM, Karen Miller wrote:
   
Ethan, have you considered Getty's Thesaurus of Geographic Names?
  It
does provide a geographic hierarchy, although the data for Athens
they provide isn't quite the one you've described:
   
http://www.getty.edu/vow/**TGNHierarchy?find=athens**
place=nation=prev_page=1**english=Ysubjectid=7001393
  http://www.g
   
  etty.edu/vow/TGNHierarchy?find=athensplace=nation=prev_page=1engl
ish=Ysubjectid=7001393
   
This vocabulary is available in XML here:
   
   
  http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
l
 http://www.getty.edu/research/tools/vocabularies/obtain/index.html
  
   
I have looked at it but not used it; it's a big tangled mess of
 XML.
   
MODS mimics a hierarchy (the subject/hierarchicalGeographic element
has these children: continent, country, province, region, state,
territory, county, city, island, area, extraterrestrialArea,
citySection). The VRA Core location element provides a similar
  mapping.
   
I try to stay away from Dublin Core, but I did venture onto the DC
Terms page just now and saw TGN listed in the vocabulary encoding
schemes there, so probably someone has implemented it.
   
Karen
   
   
Karen D. Miller
Monographic/Digital Projects Cataloger Bibliographic Services Dept.
Northwestern University Library
Evanston, IL
k-mill...@northwestern.edu
847-467-3462
   
   
   
   
-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.**EDU
   CODE4LIB@LISTSERV.ND.EDU]
On Behalf Of Ethan Gruber
Sent: Thursday, April 05, 2012 12:49 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Representing geographic hiearchy in linked data
   
Hi all,
   
I have a dilemma that needs to be sorted out.  I'm looking for an
ontology that can describe geographic hierarchy, and hopefully
  someone
   on
the list has experience with this.  For example, if I have an RDF
   record
that describes Athens, I want to point Athens to Attica, and Attica
  to
Greece, and so on.  The current proposal is to use dcterms:partOf,
  but
   the
problem with this is that our records will also use dcterms:partOf
 to
describe a completely different type of relational concept, and it
   will be
almost impossible for scripts to recognize the difference between
   these two
uses of the same DC term.
   
Thanks,
Ethan
   
   
--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
   
  
  
 



Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber
Hi,

Thanks for the info, but it's not quite what I'm looking for.  We've
established authority control for ancient places, but I'm looking for an
ontology I can use to describe the child:parent relationship between city
and region or region and larger region (in any way that isn't
dcterms:partOf).  Geonames has defined their own vocabulary that can't
really be reused in other geographic contexts, e.g. with gn:countryCode,
gn:parentCountry.

Thanks,
Ethan

On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle li...@kcoyle.net wrote:

 Also, there is Geonames (http://www.geonames.org), which is the primary
 geographic data set on the Semantic Web. Here is the link to Athens:

 http://www.geonames.org/**search.html?q=athenscountry=**GRhttp://www.geonames.org/search.html?q=athenscountry=GR

 kc


 On 4/6/12 4:54 PM, Karen Miller wrote:

 Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
 does provide a geographic hierarchy, although the data for Athens they
 provide isn't quite the one you've described:

 http://www.getty.edu/vow/**TGNHierarchy?find=athens**
 place=nation=prev_page=1**english=Ysubjectid=7001393http://www.getty.edu/vow/TGNHierarchy?find=athensplace=nation=prev_page=1english=Ysubjectid=7001393

 This vocabulary is available in XML here:

 http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htmlhttp://www.getty.edu/research/tools/vocabularies/obtain/index.html

 I have looked at it but not used it; it's a big tangled mess of XML.

 MODS mimics a hierarchy (the subject/hierarchicalGeographic element has
 these children: continent, country, province, region, state, territory,
 county, city, island, area, extraterrestrialArea, citySection). The VRA
 Core location element provides a similar mapping.

 I try to stay away from Dublin Core, but I did venture onto the DC Terms
 page just now and saw TGN listed in the vocabulary encoding schemes there,
 so probably someone has implemented it.

 Karen


 Karen D. Miller
 Monographic/Digital Projects Cataloger
 Bibliographic Services Dept.
 Northwestern University Library
 Evanston, IL
 k-mill...@northwestern.edu
 847-467-3462




 -Original Message-
 From: Code for Libraries 
 [mailto:code4...@listserv.nd.**EDUCODE4LIB@LISTSERV.ND.EDU]
 On Behalf Of Ethan Gruber
 Sent: Thursday, April 05, 2012 12:49 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Representing geographic hiearchy in linked data

 Hi all,

 I have a dilemma that needs to be sorted out.  I'm looking for an
 ontology that can describe geographic hierarchy, and hopefully someone on
 the list has experience with this.  For example, if I have an RDF record
 that describes Athens, I want to point Athens to Attica, and Attica to
 Greece, and so on.  The current proposal is to use dcterms:partOf, but the
 problem with this is that our records will also use dcterms:partOf to
 describe a completely different type of relational concept, and it will be
 almost impossible for scripts to recognize the difference between these two
 uses of the same DC term.

 Thanks,
 Ethan


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet



Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber
The data is modeled, but I want to use an ontology for geographic concepts
that already exists, if possible.  If anything, my issue highlights the
point that linked data can be *too* flexible.
On Apr 8, 2012 3:54 PM, Michael Hopwood mich...@editeur.org wrote:

 I think this highlights the point that, at some point, you have to model
 the data.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ethan Gruber
 Sent: 08 April 2012 15:44
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked data

 Hi,

 Thanks for the info, but it's not quite what I'm looking for.  We've
 established authority control for ancient places, but I'm looking for an
 ontology I can use to describe the child:parent relationship between city
 and region or region and larger region (in any way that isn't
 dcterms:partOf).  Geonames has defined their own vocabulary that can't
 really be reused in other geographic contexts, e.g. with gn:countryCode,
 gn:parentCountry.

 Thanks,
 Ethan

 On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle li...@kcoyle.net wrote:

  Also, there is Geonames (http://www.geonames.org), which is the
  primary geographic data set on the Semantic Web. Here is the link to
 Athens:
 
  http://www.geonames.org/**search.html?q=athenscountry=**GRhttp://www
  .geonames.org/search.html?q=athenscountry=GR
 
  kc
 
 
  On 4/6/12 4:54 PM, Karen Miller wrote:
 
  Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
  does provide a geographic hierarchy, although the data for Athens
  they provide isn't quite the one you've described:
 
  http://www.getty.edu/vow/**TGNHierarchy?find=athens**
  place=nation=prev_page=1**english=Ysubjectid=7001393http://www.g
  etty.edu/vow/TGNHierarchy?find=athensplace=nation=prev_page=1engl
  ish=Ysubjectid=7001393
 
  This vocabulary is available in XML here:
 
  http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
  lhttp://www.getty.edu/research/tools/vocabularies/obtain/index.html
 
  I have looked at it but not used it; it's a big tangled mess of XML.
 
  MODS mimics a hierarchy (the subject/hierarchicalGeographic element
  has these children: continent, country, province, region, state,
  territory, county, city, island, area, extraterrestrialArea,
  citySection). The VRA Core location element provides a similar mapping.
 
  I try to stay away from Dublin Core, but I did venture onto the DC
  Terms page just now and saw TGN listed in the vocabulary encoding
  schemes there, so probably someone has implemented it.
 
  Karen
 
 
  Karen D. Miller
  Monographic/Digital Projects Cataloger Bibliographic Services Dept.
  Northwestern University Library
  Evanston, IL
  k-mill...@northwestern.edu
  847-467-3462
 
 
 
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.**EDU
 CODE4LIB@LISTSERV.ND.EDU]
  On Behalf Of Ethan Gruber
  Sent: Thursday, April 05, 2012 12:49 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Representing geographic hiearchy in linked data
 
  Hi all,
 
  I have a dilemma that needs to be sorted out.  I'm looking for an
  ontology that can describe geographic hierarchy, and hopefully someone
 on
  the list has experience with this.  For example, if I have an RDF record
  that describes Athens, I want to point Athens to Attica, and Attica to
  Greece, and so on.  The current proposal is to use dcterms:partOf, but
 the
  problem with this is that our records will also use dcterms:partOf to
  describe a completely different type of relational concept, and it will
 be
  almost impossible for scripts to recognize the difference between these
 two
  uses of the same DC term.
 
  Thanks,
  Ethan
 
 
  --
  Karen Coyle
  kco...@kcoyle.net http://kcoyle.net
  ph: 1-510-540-7596
  m: 1-510-435-8234
  skype: kcoylenet
 



Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber
CIDOC-CRM may be the answer here. I will look over the documentation in
greater detail tomorrow.

Thanks,
Ethan
On Apr 8, 2012 7:56 PM, Ethan Gruber ewg4x...@gmail.com wrote:

 The data is modeled, but I want to use an ontology for geographic concepts
 that already exists, if possible.  If anything, my issue highlights the
 point that linked data can be *too* flexible.
 On Apr 8, 2012 3:54 PM, Michael Hopwood mich...@editeur.org wrote:

 I think this highlights the point that, at some point, you have to model
 the data.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ethan Gruber
 Sent: 08 April 2012 15:44
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked data

 Hi,

 Thanks for the info, but it's not quite what I'm looking for.  We've
 established authority control for ancient places, but I'm looking for an
 ontology I can use to describe the child:parent relationship between city
 and region or region and larger region (in any way that isn't
 dcterms:partOf).  Geonames has defined their own vocabulary that can't
 really be reused in other geographic contexts, e.g. with gn:countryCode,
 gn:parentCountry.

 Thanks,
 Ethan

 On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle li...@kcoyle.net wrote:

  Also, there is Geonames (http://www.geonames.org), which is the
  primary geographic data set on the Semantic Web. Here is the link to
 Athens:
 
  http://www.geonames.org/**search.html?q=athenscountry=**GRhttp://www
  .geonames.org/search.html?q=athenscountry=GR
 
  kc
 
 
  On 4/6/12 4:54 PM, Karen Miller wrote:
 
  Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
  does provide a geographic hierarchy, although the data for Athens
  they provide isn't quite the one you've described:
 
  http://www.getty.edu/vow/**TGNHierarchy?find=athens**
  place=nation=prev_page=1**english=Ysubjectid=7001393http://www.g
  etty.edu/vow/TGNHierarchy?find=athensplace=nation=prev_page=1engl
  ish=Ysubjectid=7001393
 
  This vocabulary is available in XML here:
 
  http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
  lhttp://www.getty.edu/research/tools/vocabularies/obtain/index.html
 
  I have looked at it but not used it; it's a big tangled mess of XML.
 
  MODS mimics a hierarchy (the subject/hierarchicalGeographic element
  has these children: continent, country, province, region, state,
  territory, county, city, island, area, extraterrestrialArea,
  citySection). The VRA Core location element provides a similar mapping.
 
  I try to stay away from Dublin Core, but I did venture onto the DC
  Terms page just now and saw TGN listed in the vocabulary encoding
  schemes there, so probably someone has implemented it.
 
  Karen
 
 
  Karen D. Miller
  Monographic/Digital Projects Cataloger Bibliographic Services Dept.
  Northwestern University Library
  Evanston, IL
  k-mill...@northwestern.edu
  847-467-3462
 
 
 
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.**EDU
 CODE4LIB@LISTSERV.ND.EDU]
  On Behalf Of Ethan Gruber
  Sent: Thursday, April 05, 2012 12:49 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Representing geographic hiearchy in linked data
 
  Hi all,
 
  I have a dilemma that needs to be sorted out.  I'm looking for an
  ontology that can describe geographic hierarchy, and hopefully someone
 on
  the list has experience with this.  For example, if I have an RDF
 record
  that describes Athens, I want to point Athens to Attica, and Attica to
  Greece, and so on.  The current proposal is to use dcterms:partOf, but
 the
  problem with this is that our records will also use dcterms:partOf to
  describe a completely different type of relational concept, and it
 will be
  almost impossible for scripts to recognize the difference between
 these two
  uses of the same DC term.
 
  Thanks,
  Ethan
 
 
  --
  Karen Coyle
  kco...@kcoyle.net http://kcoyle.net
  ph: 1-510-540-7596
  m: 1-510-435-8234
  skype: kcoylenet
 




Re: [CODE4LIB] RDF advice

2012-02-14 Thread Ethan Gruber
Hi Karen,

Thanks.  Would it be odd to use foaf:primaryTopic when FOAF isn't used to
describe other attributes of a concept?

Ethan

On Mon, Feb 13, 2012 at 5:59 PM, Karen Coyle li...@kcoyle.net wrote:

 On 2/13/12 1:43 PM, Ethan Gruber wrote:

 Hi Patrick,

 Thanks.  That does make sense.  Hopefully others will weigh in with
 agreement (or disagreement).  Sometimes these semantic languages are so
 flexible that it's unsettling.  There are a million ways to do something
 with only de facto standards rather than restricted schemas.  For what
 it's
 worth, the metadata files describe coin-types, an intellectual concept in
 numismatics succinctly described at
 http://coins.about.com/od/**coinsglossary/g/coin_type.htmhttp://coins.about.com/od/coinsglossary/g/coin_type.htm,
 not physical
 objects in a collection.


 I believe this is similar to what FOAF does with primary topic:
 http://xmlns.com/foaf/spec/#**term_primaryTopichttp://xmlns.com/foaf/spec/#term_primaryTopic

 In FOAF that usually points to a web page ABOUT the subject of the FOAF
 data, so a wikipedia web page about Stephen King would get this primary
 topic property. Presuming that your XML is http:// accessible, it might
 fit into this model.

 kc


 Ethan

 On Mon, Feb 13, 2012 at 4:28 PM, Patrick Murray-John
 patrickmjc...@gmail.com  wrote:

  Ethan,

 The semantics do seem odd there. It doesn't seem like a skos:Concept
 would
 typically link to a metadata record about -- if I'm following you right
 --
 a specific coin. Is this sort of a FRBRish approach, where your
 skos:Concept is similar to the abstraction of a frbr:Work (that is, the
 idea of a particular coin), where your metadata records are really
 describing the common features of a particular coin?

 If that's close, it seems like the richer metadata is really a sort of
 definition of the skos:Concept, so maybe skos:definition would do the
 trick? Something like this:

 ex:wheatPenny a skos:Concept ;
skos:prefLabel Wheat Penny ;
skos:definition Your richer, non RDF metadata document describing the
 front and back, years minted, etc.

 In XML that might be like:

 skos:Concept 
 about=http://example.org/wheatPennyhttp://example.org/**wheatPenny
 http://example.org/**wheatPenny http://example.org/wheatPenny

 
  skos:prefLabelWheat Penny/skos:prefLabel
  skos:definition
 Your richer, non RDF metadata document describing the front and back,
 years minted, etc.
  /skos:definition
  /skos:Concept


 It might raise an eyebrow to have, instead of a literal value for
 skos:definition, another set of structured, non RDF metadata. Better in
 that case to go with a document reference, and make your richer metadata
 a
 standalone document with its own URI:

 ex:wheatPenny skos:definition ex:wheatPennyDefinition**.xml

 skos:Concept 
 about=http://example.org/wheatPennyhttp://example.org/**wheatPenny
 http://example.org/**wheatPenny http://example.org/wheatPenny
 
 skos:definition 
 resource=http://example.org/wheatPenny.xmlhttp://example.org/**wheatPenny.xml
 http://**example.org/wheatPenny.xml http://example.org/wheatPenny.xml
 

 /
 /skos:Concept

 I'm looking at the Documentation as a Document Reference section in SKOS
 Primer : 
 http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/http://www.w3.org/TR/2009/**NOTE-skos-primer-20090818/
 htt**p://www.w3.org/TR/2009/NOTE-**skos-primer-20090818/http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
 


 Again, if I'm following, that might be the closest approach.

 Hope that helps,
 Patrick



 On 02/11/2012 09:53 PM, Ethan Gruber wrote:

  Hi Patrick,

 The richer metadata model is an ontology for describing coins.  It is
 more
 complex than, say, VRA Core or MODS, but not as hierarchically
 complicated
 as an EAD finding aid.  I'd like to link a skos:Concept to one of these
 related metadata records.  It doesn't matter if I use  skos, owl, etc.
 to
 describe this relationship, so long as it is a semantically appropriate
 choice.

 Ethan

 On Sat, Feb 11, 2012 at 2:32 PM, Patrick Murray-John
 patrickmjc...@gmail.com   wrote:

  Ethan,


 Maybe I'm being daft in missing it, but could I ask about more details
 in
 the richer metadata model? My hunch is that, depending on the details
 of
 the information you want to bring in, there might be more precise
 alternatives to what's in SKOS. Are you aiming to have a link between a
 skos:Concept and texts/documents related to that concept?

 Patrick


 On 02/11/2012 03:14 PM, Ethan Gruber wrote:

  Hi Ross,


 Thanks for the input.  My main objective is to make the richer
 metadata
 available one way or another to people using our web services.  Do you
 think it makes more sense to link to a URI of the richer metadata
 document
 as skos:related (or similar)?  I've seen two uses for
 skos:related--one
 to
 point to related skos:concepts, the other to point to web resources
 associated with that concept, e.g., a wikipedia article.  I have a
 feeling
 the latter is incorrect, at least

Re: [CODE4LIB] RDF advice

2012-02-13 Thread Ethan Gruber
Hi Patrick,

Thanks.  That does make sense.  Hopefully others will weigh in with
agreement (or disagreement).  Sometimes these semantic languages are so
flexible that it's unsettling.  There are a million ways to do something
with only de facto standards rather than restricted schemas.  For what it's
worth, the metadata files describe coin-types, an intellectual concept in
numismatics succinctly described at
http://coins.about.com/od/coinsglossary/g/coin_type.htm, not physical
objects in a collection.

Ethan

On Mon, Feb 13, 2012 at 4:28 PM, Patrick Murray-John 
patrickmjc...@gmail.com wrote:

 Ethan,

 The semantics do seem odd there. It doesn't seem like a skos:Concept would
 typically link to a metadata record about -- if I'm following you right --
 a specific coin. Is this sort of a FRBRish approach, where your
 skos:Concept is similar to the abstraction of a frbr:Work (that is, the
 idea of a particular coin), where your metadata records are really
 describing the common features of a particular coin?

 If that's close, it seems like the richer metadata is really a sort of
 definition of the skos:Concept, so maybe skos:definition would do the
 trick? Something like this:

 ex:wheatPenny a skos:Concept ;
skos:prefLabel Wheat Penny ;
skos:definition Your richer, non RDF metadata document describing the
 front and back, years minted, etc.

 In XML that might be like:

 skos:Concept 
 about=http://example.org/**wheatPennyhttp://example.org/wheatPenny
 
  skos:prefLabelWheat Penny/skos:prefLabel
  skos:definition
 Your richer, non RDF metadata document describing the front and back,
 years minted, etc.
  /skos:definition
  /skos:Concept


 It might raise an eyebrow to have, instead of a literal value for
 skos:definition, another set of structured, non RDF metadata. Better in
 that case to go with a document reference, and make your richer metadata a
 standalone document with its own URI:

 ex:wheatPenny skos:definition ex:wheatPennyDefinition**.xml

 skos:Concept 
 about=http://example.org/**wheatPennyhttp://example.org/wheatPenny
 
 skos:definition 
 resource=http://example.org/**wheatPenny.xmlhttp://example.org/wheatPenny.xml
 /
 /skos:Concept

 I'm looking at the Documentation as a Document Reference section in SKOS
 Primer : 
 http://www.w3.org/TR/2009/**NOTE-skos-primer-20090818/http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/

 Again, if I'm following, that might be the closest approach.

 Hope that helps,
 Patrick



 On 02/11/2012 09:53 PM, Ethan Gruber wrote:

 Hi Patrick,

 The richer metadata model is an ontology for describing coins.  It is more
 complex than, say, VRA Core or MODS, but not as hierarchically complicated
 as an EAD finding aid.  I'd like to link a skos:Concept to one of these
 related metadata records.  It doesn't matter if I use  skos, owl, etc. to
 describe this relationship, so long as it is a semantically appropriate
 choice.

 Ethan

 On Sat, Feb 11, 2012 at 2:32 PM, Patrick Murray-John
 patrickmjc...@gmail.com  wrote:

  Ethan,

 Maybe I'm being daft in missing it, but could I ask about more details in
 the richer metadata model? My hunch is that, depending on the details of
 the information you want to bring in, there might be more precise
 alternatives to what's in SKOS. Are you aiming to have a link between a
 skos:Concept and texts/documents related to that concept?

 Patrick


 On 02/11/2012 03:14 PM, Ethan Gruber wrote:

  Hi Ross,

 Thanks for the input.  My main objective is to make the richer metadata
 available one way or another to people using our web services.  Do you
 think it makes more sense to link to a URI of the richer metadata
 document
 as skos:related (or similar)?  I've seen two uses for skos:related--one
 to
 point to related skos:concepts, the other to point to web resources
 associated with that concept, e.g., a wikipedia article.  I have a
 feeling
 the latter is incorrect, at least according to the documentation I've
 read
 on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
 point to dbpedia and other web resources.

 Thanks,
 Ethan

 On Sat, Feb 11, 2012 at 12:21 PM, Ross Singerrossfsin...@gmail.com
  wrote:

  On Fri, Feb 10, 2012 at 11:51 PM, Ethan Gruberewg4x...@gmail.com

  wrote:

  Hi Ross,

 No, the richer ontology is not an RDF vocabulary, but it adheres to

  linked

  data concepts.

  Hmm, ok.  That doesn't necessarily mean it will work in RDF.

  I'm looking to do something like this example of embedding mods in
 rdf:

  
 http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_**http://www.daisy.org/zw/ZedAI_**Meta_Data_-_MODS_**

 Recommendation#RDF.2FXML_2htt**p://www.daisy.org/zw/ZedAI_**
 Meta_Data_-_MODS_**Recommendation#RDF.2FXML_2http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2
 

 Yeah, I'll be honest, that looks terrible to me.  This looks, to me,
 like kind of a misunderstanding of RDF and RDF/XML.

 Regardless, this would make useless RDF (see below).  One

Re: [CODE4LIB] RDF advice

2012-02-11 Thread Ethan Gruber
Hi Ross,

Thanks for the input.  My main objective is to make the richer metadata
available one way or another to people using our web services.  Do you
think it makes more sense to link to a URI of the richer metadata document
as skos:related (or similar)?  I've seen two uses for skos:related--one to
point to related skos:concepts, the other to point to web resources
associated with that concept, e.g., a wikipedia article.  I have a feeling
the latter is incorrect, at least according to the documentation I've read
on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
point to dbpedia and other web resources.

Thanks,
Ethan

On Sat, Feb 11, 2012 at 12:21 PM, Ross Singer rossfsin...@gmail.com wrote:

 On Fri, Feb 10, 2012 at 11:51 PM, Ethan Gruber ewg4x...@gmail.com wrote:
  Hi Ross,
 
  No, the richer ontology is not an RDF vocabulary, but it adheres to
 linked
  data concepts.

 Hmm, ok.  That doesn't necessarily mean it will work in RDF.
 
  I'm looking to do something like this example of embedding mods in rdf:
 
 http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2
 
 Yeah, I'll be honest, that looks terrible to me.  This looks, to me,
 like kind of a misunderstanding of RDF and RDF/XML.

 Regardless, this would make useless RDF (see below).  One of the hard
 things to understand about RDF, especially when you're coming at it
 from XML (and, by association, RDF/XML) is that RDF isn't
 hierarchical, it's a graph.  This is one of the reasons that the XML
 serialization is so awkward: it looks something familiar XML people,
 but it doesn't work well with their tools (XPath, for example) despite
 the fact that it, you know, should.  It's equally frustrating for RDF
 people because it's really verbose and its syntax can come in a
 million variations (more on that later in the email) making it
 excruciatingly hard to parse.

  These semantic ontologies are so flexible, it seems like I *can* do
  anything, so I'm left wondering what I *should* do--what makes the most
  sense, semantically.  Is it possible to nest rdf:Description into the
  skos:Concept of my previous example, and then place nuds:nuds.more
  sophistated model../nuds:nuds into rdf:Description (or
 alternatively,
  set rdf:Description/@rdf:resource to the URI of the web-accessible XML
 file?
 
  Most RDF examples I've looked at online either have skos:Concept or
  rdf:Description, not both, either at the same context in rdf:RDF or one
  nested inside the other.
 
 So, this is a little tough to explain via email, I think.  This is
 what I was referring to earlier about the myriad ways to render RDF in
 XML.

 In short, using:
 skos:Concept about=http://example.org/foo;
  skos:prefLabelSomething/skos:prefLabel
  ...
 /skos:Concept

 is shorthand for:

 rdf:Description about=http://example.org/foo;
  rdf:type resource=http://www.w3.org/2004/02/skos/core#Concept; /
  skos:prefLabelSomething/skos:prefLabel
 /rdf:Description

 So, yeah, you use one or the other.

 That said, I'm not sure your ontology is really going to work well,
 you'll just have to try it.  One thing that would probably be useful
 would be to serialize out a document with your nuds vocabulary as
 rdf/xml and then use something like rapper (comes with the redland
 libraries) to convert it to something more RDF-friendly, like turtle,
 and see if it makes any sense.

 For example, your daisy example above:

 rdf:RDF
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xml:mods=http://www.daisy.org/RDF/MODS;

rdf:Description rdf:ID=daisy-dtbook2005-exemplar-01

mods:titleInfo
mods:titleWorld Cultures and
 Geography/mods:title
/mods:titleInfo

mods:name
mods:namePartSarah Witham
 Bednarz/mods:namePart
mods:role
mods:roleTerm
 mods:type=textauthor/mods:roleTerm
/mods:role
/mods:name

mods:name
mods:namePartInés M.
 Miyares/mods:namePart
mods:role
mods:roleTerm
 mods:type=textauthor/mods:roleTerm
/mods:role
/mods:name

mods:name
mods:namePartMark C. Schug/mods:namePart
mods:role
mods:roleTerm
 mods:type=textauthor/mods:roleTerm
/mods:role
/mods:name

mods:name
mods:namePartCharles S.
 White/mods:namePart
mods:role
mods:roleTerm
 mods:type

Re: [CODE4LIB] RDF advice

2012-02-11 Thread Ethan Gruber
Hi Patrick,

The richer metadata model is an ontology for describing coins.  It is more
complex than, say, VRA Core or MODS, but not as hierarchically complicated
as an EAD finding aid.  I'd like to link a skos:Concept to one of these
related metadata records.  It doesn't matter if I use  skos, owl, etc. to
describe this relationship, so long as it is a semantically appropriate
choice.

Ethan

On Sat, Feb 11, 2012 at 2:32 PM, Patrick Murray-John 
patrickmjc...@gmail.com wrote:

 Ethan,

 Maybe I'm being daft in missing it, but could I ask about more details in
 the richer metadata model? My hunch is that, depending on the details of
 the information you want to bring in, there might be more precise
 alternatives to what's in SKOS. Are you aiming to have a link between a
 skos:Concept and texts/documents related to that concept?

 Patrick


 On 02/11/2012 03:14 PM, Ethan Gruber wrote:

 Hi Ross,

 Thanks for the input.  My main objective is to make the richer metadata
 available one way or another to people using our web services.  Do you
 think it makes more sense to link to a URI of the richer metadata document
 as skos:related (or similar)?  I've seen two uses for skos:related--one to
 point to related skos:concepts, the other to point to web resources
 associated with that concept, e.g., a wikipedia article.  I have a feeling
 the latter is incorrect, at least according to the documentation I've read
 on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
 point to dbpedia and other web resources.

 Thanks,
 Ethan

 On Sat, Feb 11, 2012 at 12:21 PM, Ross Singerrossfsin...@gmail.com
  wrote:

  On Fri, Feb 10, 2012 at 11:51 PM, Ethan Gruberewg4x...@gmail.com
  wrote:

 Hi Ross,

 No, the richer ontology is not an RDF vocabulary, but it adheres to

 linked

 data concepts.

 Hmm, ok.  That doesn't necessarily mean it will work in RDF.

 I'm looking to do something like this example of embedding mods in rdf:

  http://www.daisy.org/zw/ZedAI_**Meta_Data_-_MODS_**
 Recommendation#RDF.2FXML_2http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2
 Yeah, I'll be honest, that looks terrible to me.  This looks, to me,
 like kind of a misunderstanding of RDF and RDF/XML.

 Regardless, this would make useless RDF (see below).  One of the hard
 things to understand about RDF, especially when you're coming at it
 from XML (and, by association, RDF/XML) is that RDF isn't
 hierarchical, it's a graph.  This is one of the reasons that the XML
 serialization is so awkward: it looks something familiar XML people,
 but it doesn't work well with their tools (XPath, for example) despite
 the fact that it, you know, should.  It's equally frustrating for RDF
 people because it's really verbose and its syntax can come in a
 million variations (more on that later in the email) making it
 excruciatingly hard to parse.

  These semantic ontologies are so flexible, it seems like I *can* do
 anything, so I'm left wondering what I *should* do--what makes the most
 sense, semantically.  Is it possible to nest rdf:Description into the
 skos:Concept of my previous example, and then placenuds:nuds.more
 sophistated model../nuds:nuds  into rdf:Description (or

 alternatively,

 set rdf:Description/@rdf:resource to the URI of the web-accessible XML

 file?

 Most RDF examples I've looked at online either have skos:Concept or
 rdf:Description, not both, either at the same context in rdf:RDF or one
 nested inside the other.

  So, this is a little tough to explain via email, I think.  This is
 what I was referring to earlier about the myriad ways to render RDF in
 XML.

 In short, using:
 skos:Concept about=http://example.org/foo**
  skos:prefLabelSomething/**skos:prefLabel
  ...
 /skos:Concept

 is shorthand for:

 rdf:Description about=http://example.org/foo**
  rdf:type 
 resource=http://www.w3.org/**2004/02/skos/core#Concepthttp://www.w3.org/2004/02/skos/core#Concept
 /
  skos:prefLabelSomething/**skos:prefLabel
 /rdf:Description

 So, yeah, you use one or the other.

 That said, I'm not sure your ontology is really going to work well,
 you'll just have to try it.  One thing that would probably be useful
 would be to serialize out a document with your nuds vocabulary as
 rdf/xml and then use something like rapper (comes with the redland
 libraries) to convert it to something more RDF-friendly, like turtle,
 and see if it makes any sense.

 For example, your daisy example above:

 rdf:RDF

 xmlns:rdf=http://www.w3.org/**1999/02/22-rdf-syntax-ns#http://www.w3.org/1999/02/22-rdf-syntax-ns#
 

 xml:mods=http://www.daisy.**org/RDF/MODShttp://www.daisy.org/RDF/MODS
 

rdf:Description rdf:ID=daisy-dtbook2005-**exemplar-01

mods:titleInfo
mods:titleWorld Cultures and
 Geography/mods:title
/mods:titleInfo

mods:name

Re: [CODE4LIB] Metadata

2012-02-10 Thread Ethan Gruber
An interface is only as useful as the metadata allows it to be, and the
metadata is only as useful as the interface built to take advantage of it.

Ethan

On Fri, Feb 10, 2012 at 4:10 PM, David Faler dfa...@tlcdelivers.com wrote:

 I think the answer is make sure you are able to add new elements to the
 store later, and keep around your source data and plan to be able to
 reprocess it.  Something like what XC is doing.  That way, you get to be
 agile at the beginning and just deal with what you *know* is absolutely
 needed, and add more when you can make a business case for it.  Especially
 if you are looking to deal with MARC or ONIX data.

 On Fri, Feb 10, 2012 at 3:57 PM, Patrick Berry pbe...@gmail.com wrote:

  So, one question I forgot to toss out at the Ask Anything session is:
 
  When do you know you have enough metadata?
 
  You'll know it when you have it, isn't the response I'm looking for.
  So,
  I'm sure you're wondering what the context for this question is, and
  honestly there is none.  This is geared towards contentDM or DSpace or
  Omeka or Millennium.  I've seen groups not plan enough for collecting
 data
  and I've seen groups that are have been planning so long they forgot what
  they were supposed to be collecting in the first place.
 
  So, I'll just throw that vague question out there and see who wants to
 take
  a swing.
 
  Thanks,
  Pat/@pberry
 



Re: [CODE4LIB] RDF advice

2012-02-10 Thread Ethan Gruber
Hi Ross,

No, the richer ontology is not an RDF vocabulary, but it adheres to linked
data concepts.

I'm looking to do something like this example of embedding mods in rdf:
http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2

These semantic ontologies are so flexible, it seems like I *can* do
anything, so I'm left wondering what I *should* do--what makes the most
sense, semantically.  Is it possible to nest rdf:Description into the
skos:Concept of my previous example, and then place nuds:nuds.more
sophistated model../nuds:nuds into rdf:Description (or alternatively,
set rdf:Description/@rdf:resource to the URI of the web-accessible XML file?

Most RDF examples I've looked at online either have skos:Concept or
rdf:Description, not both, either at the same context in rdf:RDF or one
nested inside the other.

Thanks,
Ethan

On Fri, Feb 10, 2012 at 9:44 PM, Ross Singer rossfsin...@gmail.com wrote:

 The whole advantage of RDF is that you can pull properties from different
 vocabularies (as long as they're not logically disjoint). So, assuming your
 richer ontology is some kind of RDF vocabulary, this exactly *what* you
 should be doing.

 -Ross.

 On Feb 10, 2012, at 4:31 PM, Ethan Gruber ewg4x...@gmail.com wrote:

  Hi all,
 
  I'm working on an RDF model for describing concepts.  I have skos:Concept
  nested inside rdf:RDF.  Most documents will have little more than labels
  and related links inside of skos:Concept.  However, for a certain type of
  concept, we have XML documents with a more sophisticated ontology and
  structure for describing the concept.  I could embed this metadata into
 the
  RDF or reference it as an rdf:resource.  It doesn't matter much to me
  either way, but I'm unsure of the semantically correct way to create this
  model.
 
  Suppose I have:
 
  rdf:RDF
  skos:Concept rdf:about=URI
  skos:prefLabel xml:lang=enLabel/skos:prefLabel
  nuds:nuds.more sophistated model../nuds:nuds
  /skos:Concept
  /rdf:RDF
 
  Is it okay to have the more sophistated metadata model embedded in
  skos:Concept alongside labels and related links?  Suppose I want to store
  the more sophisticated metadata separately and reference it?  I'm not
 sure
  what property adequately addresses this relation, semantically.
 
  Recommendations?
 
  Thanks,
  Ethan



Re: [CODE4LIB] Job: Head, Digital Projects Metadata, Beinecke Rare Book and Manuscript Library at Yale University

2012-02-07 Thread Ethan Gruber
Why are MLS degrees always required for these sorts of jobs?

Ethan

On Tue, Feb 7, 2012 at 4:21 PM, jobs4...@gmail.com wrote:

 Yale University offers exciting opportunities for achievement and growth in
 New Haven, Connecticut. Conveniently located between Boston and New York,
 New
 Haven is the creative capital of Connecticut with cultural resources that
 include two major art museums, a critically-acclaimed repertory theater,
 state-of-the-art concert hall, and world-renowned schools of Architecture,
 Art, Drama, and Music.

 **The University and the Library**
 The Yale University Library, as one of the world's leading research
 libraries,
 collects, organizes, preserves, and provides access to and services for a
 rich
 and unique record of human thought and creativity. It fosters intellectual
 growth and is a highly valued partner in the teaching and research
 missions of
 Yale University and scholarly communities worldwide. A distinctive
 strength is
 its rich spectrum of resources, including more than 12.5 million volumes
 and
 information in all media, ranging from ancient papyri to early printed
 books
 to electronic databases. The Library is engaged in numerous digital
 initiatives designed to provide access to a full array of scholarly
 information. Housed in the Sterling Memorial Library and twenty school and
 departmental libraries, it employs a dynamic, diverse, and innovative
 staff of
 over 500who have the opportunity to work with the highest caliber of
 faculty
 and students, participate on committees, and are involved in other areas of
 staff development. For additional information on the Yale University
 Library,
 please visit the Library's web site at[http://www.library.y
 ale.edu/](http://www.library.yale.edu/).

 **Beinecke Rare Book and Manuscript Library**
 The Beinecke Library is Yale's principal repository for literary papers and
 early manuscripts and rare books. In addition to distinguished general
 collections, the library houses the Osborn Collection, noted for its
 British
 and literary and historical manuscripts, and outstanding special
 collections
 devoted to American literature, German literature, and Western Americana.
 The
 Beinecke's collections include materials ranging from medieval manuscripts
 to
 born-digital electronic records, audio and video. The Beinecke has
 undertaken
 an ambitious digitization program and offers online access to over 150,000
 images through its Digital Images Online database, as well as access to
 streaming audio and video, and to a host of online exhibitions and digital
 projects involving blogs, podcasts, and social-tagging. The Beinecke is
 currently engaged in bringing intentionality to the development of the
 Library's digital resources and projects, and to providing responsive and
 effective services to online users of the Beinecke's materials as well as
 thoughtful integration with other digital efforts at Yale. For additional
 information about the Beinecke Library, visit[
 http://www.library.yale.edu/bein
 ecke/.](http://www.library.yale.edu/beinecke/)

 **General Purpose**
 Under the general direction of the Head of Technical Services and working
 in
 close collaboration with the Head of Technology and Digital Assets, the
 Digital Imaging Studio Production Manager, and units across the Beinecke
 Library, the Head of Digital Projects  Metadata plays a leading role in
 creating, describing, and delivering digitized resources and in exploring,
 proposing, and developing innovative tools and services that improve the
 ability of scholars, students, and educators to make use of existing and
 emerging digital resources.

 **Responsibilities**
 The Head of Digital Projects  Metadata is responsible for the day-to-day
 management of a variety of digital projects and is responsible for
 overseeing
 and creating metadata across a wide range of materials including
 manuscripts,
 photographs, ephemera, art objects, maps, prints and drawings, books, and
 other printed material. The Head of Digital Projects  Metadata provides
 leadership and technical expertise in the investigation and application of
 new
 metadata standards; defines input standards; devises quality control
 routines;
 proposes local policies and procedures; maintains and enhances current
 metadata infrastructure and practices; prepares and evaluates material for
 digital capture; participates in managing the workflow of the Digital
 Studio
 and coordinates and supervises metadata creation by staff, student
 assistants,
 and interns; hires and supervises Digital Projects  Metadata staff;
 provides
 guidance, training, skill development, and performance evaluation;
 participates in the formulation of policies and procedures for the
 Technical
 Services Department. The Head of Digital Projects  Metadata is a liaison
 to
 the Technology and Digital Assets Department and works collaboratively with
 other Library staff to develop and employ improved interfaces and delivery
 tools. 

Re: [CODE4LIB] Job: Head, Digital Projects Metadata, Beinecke Rare Book and Manuscript Library at Yale University

2012-02-07 Thread Ethan Gruber
Interesting point about the flexibility of librarians, but it's certainly
possible to be knowledgeable and experienced with information management
and developing sophisticated metadata systems without having an MLS.  I'm
not reflecting on Yale specifically, but many of the job postings that fit
into this category that I have seen posted to code4lib over the years
require an MLS/MLIS.  I think it's fair to ask why this is the case.

Ethan

On Tue, Feb 7, 2012 at 4:32 PM, Kimberly Silk 
kimberly.s...@rotman.utoronto.ca wrote:

 Because we are trained in information management, and many of us
 specialize in management of digital assets. That said, there are many other
 professions that also have these skills and passion for the digital bit.
 Since it's Yale, there is likely an employment agreement that the library
 will hire those with an MLS or equivalent.

 Things change slowly in academia - but as librarians explore new roles, so
 should university libraries consider other types of professions. There's a
 lot of cross-over.

 Kim

 
 Kimberly Silk, MLS
 Data Librarian, Martin Prosperity Institute
 Rotman School of Management, University of Toronto
 E: kimberly.s...@martinprosperity.org
 T: http://twitter.com/kimberlysilk
 Skype: kimberly.silk



 On 2012-02-07, at 4:27 PM, Ethan Gruber wrote:

  Why are MLS degrees always required for these sorts of jobs?
 
  Ethan
 
  On Tue, Feb 7, 2012 at 4:21 PM, jobs4...@gmail.com wrote:
 
  Yale University offers exciting opportunities for achievement and
 growth in
  New Haven, Connecticut. Conveniently located between Boston and New
 York,
  New
  Haven is the creative capital of Connecticut with cultural resources
 that
  include two major art museums, a critically-acclaimed repertory theater,
  state-of-the-art concert hall, and world-renowned schools of
 Architecture,
  Art, Drama, and Music.
 
  **The University and the Library**
  The Yale University Library, as one of the world's leading research
  libraries,
  collects, organizes, preserves, and provides access to and services for
 a
  rich
  and unique record of human thought and creativity. It fosters
 intellectual
  growth and is a highly valued partner in the teaching and research
  missions of
  Yale University and scholarly communities worldwide. A distinctive
  strength is
  its rich spectrum of resources, including more than 12.5 million volumes
  and
  information in all media, ranging from ancient papyri to early printed
  books
  to electronic databases. The Library is engaged in numerous digital
  initiatives designed to provide access to a full array of scholarly
  information. Housed in the Sterling Memorial Library and twenty school
 and
  departmental libraries, it employs a dynamic, diverse, and innovative
  staff of
  over 500who have the opportunity to work with the highest caliber of
  faculty
  and students, participate on committees, and are involved in other
 areas of
  staff development. For additional information on the Yale University
  Library,
  please visit the Library's web site at[http://www.library.y
  ale.edu/](http://www.library.yale.edu/).
 
  **Beinecke Rare Book and Manuscript Library**
  The Beinecke Library is Yale's principal repository for literary papers
 and
  early manuscripts and rare books. In addition to distinguished general
  collections, the library houses the Osborn Collection, noted for its
  British
  and literary and historical manuscripts, and outstanding special
  collections
  devoted to American literature, German literature, and Western
 Americana.
  The
  Beinecke's collections include materials ranging from medieval
 manuscripts
  to
  born-digital electronic records, audio and video. The Beinecke has
  undertaken
  an ambitious digitization program and offers online access to over
 150,000
  images through its Digital Images Online database, as well as access to
  streaming audio and video, and to a host of online exhibitions and
 digital
  projects involving blogs, podcasts, and social-tagging. The Beinecke is
  currently engaged in bringing intentionality to the development of the
  Library's digital resources and projects, and to providing responsive
 and
  effective services to online users of the Beinecke's materials as well
 as
  thoughtful integration with other digital efforts at Yale. For
 additional
  information about the Beinecke Library, visit[
  http://www.library.yale.edu/bein
  ecke/.](http://www.library.yale.edu/beinecke/)
 
  **General Purpose**
  Under the general direction of the Head of Technical Services and
 working
  in
  close collaboration with the Head of Technology and Digital Assets, the
  Digital Imaging Studio Production Manager, and units across the Beinecke
  Library, the Head of Digital Projects  Metadata plays a leading role in
  creating, describing, and delivering digitized resources and in
 exploring,
  proposing, and developing innovative tools and services

Re: [CODE4LIB] Metadata war stories...

2012-01-27 Thread Ethan Gruber
EDIT ME

http://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink

On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennant roytenn...@gmail.com wrote:

 Oh, I should have also mentioned that some of the worst problems occur
 when people treat their metadata like it will never leave their
 institution. When that happens you get all kinds of crazy cruft in a
 record. For example, just off the top of my head:

 * Embedded HTML markup (one of my favorites is an img tag)
 * URLs to remote resources that are hard-coded to go through a
 particular institution's proxy
 * Notes that only have meaning for that institution
 * Text that is meant to display to the end-user but may only do so in
 certain systems; e.g., Click here in a particular subfield.

 Sigh...
 Roy

 On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennant roytenn...@gmail.com wrote:
  Thanks a lot for the kind shout-out Leslie. I have been pondering what
  I might propose to discuss at this event, since there is certainly
  plenty of fodder. Recently we (OCLC Research) did an investigation of
  856 fields in WorldCat (some 40 million of them) and that might prove
  interesting. By the time ALA rolls around there may something else
  entirely I could talk about.
 
  That's one of the wonderful things about having 250 million MARC
  records sitting out on a 32-node cluster. There are any number of
  potentially interesting investigations one could do.
  Roy
 
  On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslie lesl...@loc.gov
 wrote:
  Roy's fabulous Bitter Harvest paper:
 http://roytennant.com/bitter_harvest.html
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of Walter Lewis
  Sent: Wednesday, January 25, 2012 1:38 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Metadata war stories...
 
  On 2012-01-25, at 10:06 AM, Becky Yoose wrote:
 
  - Dirty data issues when switching discovery layers or using
  legacy/vendor metadata (ex. HathiTrust)
 
  I have a sharp recollection of a slide in a presentation Roy Tennant
 offered up at Access  (at Halifax, maybe), where he offered up a range of
 dates extracted from an array of OAI harvested records.  The good, the bad,
 the incomprehensible, the useless-without-context (01/02/03 anyone?) and on
 and on.  In my years of migrating data, I've seen most of those variants.
  (except ones *intended* to be BCE).
 
  Then there are the fielded data sets without authority control.  My
 favourite example comes from staff who nominally worked for me, so I'm not
 telling tales out of school.  The classic Dynix product had a Newspaper
 index module that we used before migrating it (PICK migrations; such a
 joy).  One title had twenty variations on Georgetown Independent (I wish
 I was kidding) and the dates ranged from the early ninth century until
 nearly the 3rd millenium. (apparently there hasn't been much change in
 local council over the centuries).
 
  I've come to the point where I hand-walk the spatial metadata to links
 with to geonames.org for the linked open data. Never had to do it for a
 set with more than 40,000 entries though.  The good news is that it isn't
 hard to establish a valid additional entry when one is required.
 
  Walter



Re: [CODE4LIB] Why are we afraid to criticize library software in public?

2012-01-25 Thread Ethan Gruber
+1

On Wed, Jan 25, 2012 at 4:36 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 On 1/25/2012 1:13 PM, Kyle Banerjee wrote:

 itself. For example, there's a system used for many digital archives that
 splits a field in two anytime a field that needs to be represented by an
 XML entity is encountered. Name withheld to protect the guilty.


 Why are we so eager to 'protect the guilty' in discussions like this?

 Our reluctance to share info on problems with software we use (because of
 fear of offending the vendor?) means that it's very difficult for a library
 to find out about the plusses and minuses of any given product when
 evaluating solutions.

 Don't even bother googling -- nobody will publically call this stuff out
 on a blog, or even in a public listserv!  It's on private customer-only
 listservs and bug trackers, or even more likely nowhere at all.  When you
 want to find out the real deal, you have to start from scratch, contact
 personal contacts at other institutions that have experience with each
 software you are curious about, and ask them one-on-one in private.
  Wasting time, cause everybody has to do that each time they want to find
 out the current issues, so many offline one and one conversations (or so
 many people that just give up and don't even do the 'due dilligence'), only
 finding out about things your personal contact happened to have encountered.

 Why can't we just share this stuff in public and tell it like it is, so
 the information is available for people who need it?

 If you want to find out about problems and issues with _succesful_
 software that isn't library-specific, it's not hard to. You can often find
 public issue trackers from the developers, but if not you can find public
 listservs and many blog posts where people aren't afraid to describe the
 problem(s) they encountered, there's no 'protecting of the guilty.' Hint,
 this is part of what _makes_ such software succesful.

 Jonathan



[CODE4LIB] Announcement for Linked Ancient World Data Institute May 31-June 2, 2012, NYC

2012-01-12 Thread Ethan Gruber
*Applications due February 17*

New York University’s Institute for the Study of the Ancient World
(ISAWhttp://isaw.nyu.edu/)
will host the Linked Ancient World Data Institute (LAWDI) from May 31st to
June 2nd, 2012 in New York City. “Linked Open
Datahttp://en.wikipedia.org/wiki/Linked_Data”
is an approach to the creation of digital resources that emphasizes
connections between diverse information on the basis of published and
stable web addresses (URIs) that identify common concepts and individual
items. LAWDI, funded by the Office of Digital Humanities of the National
Endowment for Humanities http://www.neh.gov/odh/, will bring together an
international faculty of practitioners working in the field of Linked Data
with twenty attendees who are implementing or planning the creation of
digital resources.

LAWDI’s intellectual scope is the Ancient Mediterranean and Ancient Near
East, two fields in which a large and increasing number of digital
resources is available, with rich coverage of the archaeology, literature
and history of these regions. Many of these resources publish stable URIs
for their content and so are enabling links and re-use that create a varied
research and publication environment. LAWDI attendees will learn how to
take advantage of these resources and also how to contribute to the growing
network of linked scholarly materials.

The organizers encourage applications from faculty, university staff,
graduate students, librarians, museum professionals, archivists and others
with a serious interest in creating digital resources for the study of the
Ancient World. Applications to attend should take the form of a one-page
statement of interest e-mailed to sebastian.he...@nyu.edu by *Friday,
February 17*. A discussion of current or planned work should be a prominent
part of this statement. As part of the curriculum, successful applicants
will be asked to present their work and be ready to actively participate in
conversations about topics presented by faculty and the other participants.

The announcement for LAWDI is
herehttp://wiki.digitalclassicist.org/Linked_Ancient_World_Data_Instituteand
the organizers are grateful for any circulation of this information.

A second session of LAWDI will also take place from May 30 to June 1 of
2013 at Drew University in New Jersey (http://drew.edu).


[CODE4LIB] Embedding XHTML into RDF

2012-01-11 Thread Ethan Gruber
Hi all,

Suppose I have RDF describing an object, and I would like some fairly
free-form human generating description about the object (let's say within
dcterms:description).  Is it semantically acceptable to have XHTML nested
directly in this element or would this be considered uncouth for LOD?

Thanks,
Ethan


  1   2   >