from:"Ross Singer"

Re: [CODE4LIB] LCSH and Linked Data

2011-04-15 Thread Ross Singer

On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote:

 I’m sure this is way too much info for most (or all) on this list, but in 
 case it is helpful, I thought I’d throw it out there.

I disagree.  I think this was fantastic and most enlightening.  Most
of us deal with this stuff all the time, yet we (obviously) have zero
idea how it actually works, so it's nice to be schooled (and have this
mini-lesson in LCSH contextually in the mailing list archives).

Thanks for putting this out there, Kelley.
-Ross.

Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Ross Singer

On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote:

 Then obviously I lose the context of the full heading - so I also want to
 look for
 Education--England--Finance (which I won't find on id.loc.gov as not
 authorised)

 At this point I could stop, but my feeling is that it is useful to also look
 for other combinations of the terms:

 Education--England (not authorised)
 Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008)

 My theory is that as long as I stick to combinations that start with a
 topical term I'm not going to make startlingly inaccurate statements?

I would definitely ask this question somewhere other than Code4lib
(autocat, maybe?), since I think the answer is more complicated than
this (although they could validate/invalidate your assumption about
whether or not this approach would get you close enough).

My understanding is that Education--England--Finance *is* authorized,
because Education--Finance is and England is a free-floating
geographic subdivision.  Because it's also an authorized heading,
Education--England--Finance is, in fact, an authority.  The problem
is that free-floating subdivisions cause an almost infinite number of
permutations, so there aren't LCCNs issued for them.

This is where things get super-wonky.  It's also the reason I
initially created lcsubjects.org, specifically to give these (and,
ideally, locally controlled subject headings) a publishing
platform/centralized repository, but it quickly grew to be more than
just a side project.  There were issues of how the data would be
constructed (esp. since, at the time, I had no access to the NAF), how
to reconcile changes, provenance, etc.  Add to the fact that 2 years
ago, there wasn't much linked library data going on, it was really
hard to justify the effort.

But, yeah, it would be worth running your ideas by a few catalogers to
see what they think.

-Ross.

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ross Singer

On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote:

 1. I believe id.loc.gov includes a list of MARC countries and a list for
 geographic areas (based on the geographic names in 151 fields.
 2. cataloging rules instruct catalogers to use THOSE very name forms in 151
 $a when a subject can be divided (limited)  geographically using $z.

Yeah, this could get ugly pretty fast.  It's a bit unclear to me what
the distinction is between identical terms in both the geographic
areas and the country codes
(http://id.loc.gov/vocabulary/geographicAreas/e-uk-en 
http://id.loc.gov/vocabulary/countries/enk).  Well, in LC's current
representation, there *is* no distinction, they're both just
skos:Concepts that (by virtue of skos:exactMatch) effectively
interchangeable.

See also http://id.loc.gov/vocabulary/geographicAreas/fa and
http://id.loc.gov/authorities/sh85009230#concept.  You have a single
institution minting multiple URIs for what is effectively the same
thing (albeit in different vocabularies), although, ironically,
nothing points at any actual real world objects.

VIAF doesn't do much better in this particular case (there are lots of
examples where it does, mind you):  http://viaf.org/viaf/142995804
(see: http://viaf.org/viaf/142995804/rdf.xml).  We have all of these
triangulations around the concept of England or Atlas mountains,
but we can't actually refer to England or the Atlas mountains.

Also, I am not somehow above this problem, either.  With the linked
MARC codes lists (http://purl.org/NET/marccodes/), I had to make a
similar decision, I just chose to go the opposite route:  define them
as things, rather than concepts
(http://purl.org/NET/marccodes/gacs/fa#location,
http://purl.org/NET/marccodes/gacs/e-uk-en#location,
http://purl.org/NET/marccodes/countries/enk#location, etc.), which
presents its own set of problems
(http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing
no matter how liberal your definition).

At some point, it's worth addressing what these things actually *are*
and if, indeed, they are effectively the same thing, if it's worth
preserving these redundancies, because I think they'll cause grief in
the future.

-Ross.

Re: [CODE4LIB] Documentation request for the marc gem

2011-03-16 Thread Ross Singer

Thanks, Ed.  That would have been a useful tidbit for me to have added :)

Also, if there's interest, we can set up the Github Wiki for
ruby-marc.  There is some functionality that would be difficult to
explain (including the pros and cons) about in the rdocs, such as the
XML parsers (and to write new ones) and there are some caveats on when
to use field maps in MARC::Record and when find/find_all works better.

Anyway, this seems like it might be useful, and if others think so,
too, well, let me know!

Thanks!
-Ross.

On Wed, Mar 16, 2011 at 6:02 AM, Ed Summers e...@pobox.com wrote:
 Hi Tony,

 Just in case it wasn't obvious, the source code is on GitHub [1]. As
 Ross said, please consider forking it and sending a pull request for
 any documentation improvements you want to do.

 //Ed

 [1] https://github.com/ruby-marc/ruby-marc

 On Tue, Mar 15, 2011 at 3:18 PM, Ross Singer rossfsin...@gmail.com wrote:
 Hi Tony, I'm glad that ruby-marc appears to be generally useful.

 Another (even simpler) way to do what you want is:

 record.to_marc

 Which, I think, would do the same thing you're doing with 
 MARC::Writer.encode.

 If you want to write up a block of text to plop into the README, feel
 free to send some me some copy (wholesale edits also welcome).

 Thanks,
 -Ross.

 On Tue, Mar 15, 2011 at 2:40 PM, Tony Zanella tony.zane...@gmail.com wrote:
 Hello all,
 If I may suggest adding to the documentation for the marc gem
 (http://marc.rubyforge.org/)...

 Currently, the documentation gives examples for how to read, create and 
 write
 MARC records.

 The source code also includes an encode method in MARC::Writer, which came
 in handy for me when I needed to send an encoded record off to be archived 
 on
 the fly, without writing it to the filesystem.

 That method isn't in the documentation, but it would be nice to see there! 
 It
 could be as simple as:

 # encoding a record
 MARC::Writer.encode(record)

 Thanks for your consideration!
 Tony

Re: [CODE4LIB] Documentation request for the marc gem

2011-03-15 Thread Ross Singer

Hi Tony, I'm glad that ruby-marc appears to be generally useful.

Another (even simpler) way to do what you want is:

record.to_marc

Which, I think, would do the same thing you're doing with MARC::Writer.encode.

If you want to write up a block of text to plop into the README, feel
free to send some me some copy (wholesale edits also welcome).

Thanks,
-Ross.

On Tue, Mar 15, 2011 at 2:40 PM, Tony Zanella tony.zane...@gmail.com wrote:
 Hello all,
 If I may suggest adding to the documentation for the marc gem
 (http://marc.rubyforge.org/)...

 Currently, the documentation gives examples for how to read, create and write
 MARC records.

 The source code also includes an encode method in MARC::Writer, which came
 in handy for me when I needed to send an encoded record off to be archived on
 the fly, without writing it to the filesystem.

 That method isn't in the documentation, but it would be nice to see there! It
 could be as simple as:

 # encoding a record
 MARC::Writer.encode(record)

 Thanks for your consideration!
 Tony

Re: [CODE4LIB] App Recommendations

2011-03-11 Thread Ross Singer

Another possible alternative to Marginalia might be Markup.io:

http://markup.io/

which I'm happy to plug because besides merely being cool, it was made
by some folks that live in my neighborhood.  It may not be exactly
what you're looking for, though, since it's not necessarily
text-centric.

-Ross.


On Thu, Mar 10, 2011 at 3:18 PM, Nathan Tallman ntall...@gmail.com wrote:
 Hi Code4Libers,

 I'm usually just a lurker on here (love to follow the threads and learn new
 things), but I'm presently in need of some recommendations. Might I seek the
 collective wisdom of the list? There are two things I'm seeking tech
 solutions for.

 The first is web app/widget/AJAX similar that allows users to make
 annotations on a web page, specifically finding aids in my case. I've
 already taken a look at Marginalia, but the demo had problems in Google
 Chrome.

 The other thing I need is an easy to use project management application,
 desktop or web. Does anyone have any favorites?

 Thank you Code4Lib! I hope to one day have enough tech skills to attend
 Code4Lib with pride! Right now, I'm about 2/3rds there ;)

 Nathan Tallman
 Associate Archivist
 American Jewish Archives

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-08 Thread Ross Singer

Cindy, sorry, I realize that was vague.  I have shell access on Site5,
but since you're using shared resources, they monitor your CPU/memory
usage.  During high volume on a particular server, they'll kill
processes that are running to make sure they can meet demands.  This
*could* happen when you're trying to compile something, which tends to
be CPU-intensive, although it just depends.

I've had their trigger kick in while trying to install ruby gems,
although it's completely unpredictable (that is, based on all sorts of
variables) - sometimes the gems install with no problem, other times
they're killed.  Compiling yaz is probably less of an issue (the
makefile calls lots of things that run intensely, but quickly) than
the pecl install of php/yaz.

Running things in nice (http://linux.die.net/man/2/nice) probably
helps your chances, but YMMV.  I don't think this policy is exclusive
to Site5, pretty much all of the major shared web hosting providers
will have something similar in place, otherwise users could constantly
have processes running in shells.

Like I said, though, it shouldn't be a problem, it just might take a
few tries (which will be less work, in the long run, then running your
own VPS).

-Ross.

On Tue, Mar 8, 2011 at 10:05 AM, Cindy Harper char...@colgate.edu wrote:
 Sorry - what do you mean by triggers their usage monitor - CPU usage above
 a certain threshold? Or they don't allow compiles?  I spoke with Bluehost,
 and they indicated that if I got SSH access, I could try to compile it
 myself. I'll check to see if this is possible with Lunarpages, which we now
 have accounts with.

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363



 On Mon, Mar 7, 2011 at 1:58 PM, Ross Singer rossfsin...@gmail.com wrote:

 Cindy, I think this might be possible, depending on the provider.  I
 have a site on Site5 and this seems pretty doable (it looks like I
 might have even tried this at some point, since I seem to have a
 compiled version of yaz in my home directory).  It would probably take
 some rooting around in the forums to see how people successfully are
 installing PECL extensions and it might take a few tries to compile
 yaz successfully (since if it triggers their usage monitor, they'll
 kill the process), but I think it would be worth a shot.  I would
 definitely recommend this before jumping to a VPS (and let's be
 realistic, everybody, if you're being this blasee about running a VPS,
 you are either investing some time/expertise sys admining it or you
 have an insecure server waiting to be exploited).

 Good luck!
 -Ross.



 On Mon, Mar 7, 2011 at 1:17 PM, Cindy Harper char...@colgate.edu wrote:
  I guess I was hoping to have service such as that provided by my current
  hosting service, where security,etc., updates for L A M  P are all
  taken
  care of by the host. Any recommendations along those lines?  One that
  provides that and still lets me install what I want? My service
  suggested
  that I go to a VPS account,where I'd have to do my own updates.
 
  Cindy Harper, Systems Librarian
  Colgate University Libraries
  char...@colgate.edu
  315-228-7363
 
 
 
  On Mon, Mar 7, 2011 at 11:00 AM, Han, Yan
  h...@u.library.arizona.eduwrote:
 
  You can just buy a node from a variety of cloud providers such as
  Amazon
  EC2, Linode etc. (It is very easy to build anything you want).
 
 
  Yan
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Cindy Harper
  Sent: Sunday, March 06, 2011 10:54 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] LAMP Hosting service that supports php_yaz?
 
  At the risk of exhausting my quota of messages for the month - Our LAMP
  hosting service does not support PECL extension php_yaz. Does anyone
  know of
  a service that does?
 
  Cindy Harper, Systems Librarian
  Colgate University Libraries
  char...@colgate.edu
  315-228-7363

Re: [CODE4LIB] facets in Atom feeds

2011-03-03 Thread Ross Singer

So that seems to just be using the atom:category element, which is
clever, but it wouldn't give you facet counts for the total results
set (just for the returned page).

It's possible to have categories across the entire result set (they'd
be at the feed level, rather than the entry level), but you wouldn't
have any counts or links for your filtered search results and you'd
need some way to turn the scheme attribute into facet field,
although all of these are pretty easily achievable (they'd just really
need an XML namespace and some consensus).

Take:
category scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/books/2008#volume'/

You could easily do something like:
category scheme='http://example.org/facets/fields#subject'
term='History' ex:facetCount=1024
ex:href='http://example.org/search?q=your+searchfct[subject]=History'
/

or whatever.

-Ross.

On Thu, Mar 3, 2011 at 3:06 PM, Peter Murray peter.mur...@lyrasis.org wrote:
 That's pretty cool, but I had to fire up Parallels on my Mac to see it in 
 MSIE.  For those that may not have Windows readily available, this is what it 
 looks like:

 http://twitpic.com/45r6sn


 Peter

 On Mar 3, 2011, at 1:51 PM, Jonathan Rochkind wrote:

 Someone recently on this list was saying something about ways to embed
 facets in for instance Atom feeds.

 I was reminded of that, because checking out an Atom feed from Google
 Books Data API, in Internet Explorer... Internet Explorer displays
 'facet' type restrictions for it, under a heading Filter by category.
 It also displays sort options, apparently somehow the feed is
 advertising it's sort options too in a way that a client like IE can act
 upon?

 Haven't looked into the details, but here's an example feed:
 http://books.google.com/books/feeds/volumes?q=LCCN07037314

 Look at it in IE for instance.

 So whatever's being done here is apparently already somewhat standard,
 at least IE recognizes what Google does? I'd encourage SRU or whoever to
 follow their lead.

 [I agree that simply copying the Solr API for a standard like SRU is not
 the way to go -- Solr is an application that supports various low-level
 things that are not appropriate in that level of detail for a standard
 like SRU or what have you, at least not until they've been shown to be
 needed.]


 --
 Peter Murray         peter.mur...@lyrasis.org        tel:+1-678-235-2955
 Ass't Director, Technology Services Development   http://dltj.org/about/
 Lyrasis   --    Great Libraries. Strong Communities. Innovative Answers.
 The Disruptive Library Technology Jester                http://dltj.org/
 Attrib-Noncomm-Share   http://creativecommons.org/licenses/by-nc-sa/2.5/

Re: [CODE4LIB] GPL incompatible interfaces

2011-02-18 Thread Ross Singer

On Fri, Feb 18, 2011 at 9:30 AM, Eric Hellman e...@hellman.net wrote:
 Since the Metalib API is not public, to my knowledge, I don't know whether it 
 gets disclosed with an NDA. And you can't run or develop Xerxes without an 
 ExLibris License, because it depends on a proprietary and unspecified data 
 set.

This is a very good point (and neither here nor there on the licensing
issue).  Ex Libris, in particular, has always had an awkward
relationship between the NDA-for-customers-eyes-only policy regarding
their X-Services documentation and their historic tolerance for open
source applications built upon said services.  The latter undermines
the former significantly, since the documentation could theoretically
be reverse-engineered if the open source projects' uses of it are
comprehensive enough.  I'll leave whether or not having an NDA on API
documentation makes sense as an exercise of the reader.

It does mean, however, that Ex Libris could at any point claim that
these projects violate those terms, which is a risk, although probably
a risk worth taking.

On the opposite end of the spectrum, you have SirsiDynix who refuse
the distribution of applications written using their Symphony APIs to
anybody but SD customers-in-good-standing-that-have-received-API-training.

While SD's position is certainly draconian (and, in my opinion, rather
counter-productive), it does let the developer know where she or he
stands with no sense of ambiguity coming from the company.

-Ross.

Re: [CODE4LIB] EZB

2011-02-17 Thread Ross Singer

On Thu, Feb 17, 2011 at 11:16 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Interesting, does their link resolver API do article-level links, or just
 journal title level links?

 I/you/one could easily write a plugin for Umlaut for their API, would be an
 interesting exersize.

I think it would also be interesting to make the data available for
download/reuse, if possible.

-Ross.

 On 2/17/2011 1:18 AM, Markus Fischer wrote:

 The cheapest and best A to Z list i know is the german EZB:


 http://rzblx1.uni-regensburg.de/ezeit/index.phtml?bibid=Acolors=7lang=en

 This list is maintained by hunderds of libraries. You just mark those
 journals you have licensed and that's it.

 Not very widely known: they do also provide an API which you can use as
 a free linkresolver. There are free tools you can plug into this API and
 you've got your linkresolver.

 The list is incredible accurate and you'll have almost no effort: any
 change made by one library is valid for all.

 Let me know if you need more information.

 Markus Fischer

 Am 16.02.2011 22:18, schrieb Michele DeSilva:

 Hi Code4Lib-ers,

 I want to chime in and say that I, too, enjoyed the streaming archive
 from the conference.

 I also have a question: my library has a horribly antiquated A to Z list
 of databases and online resources (it's based in Access). We'd like to do
 something that looks more modern and is far more user friendly. I found a
 great article in the Code4Lib journal (issue 12, by Danielle Rosenthal
 Mario Bernado) about building a searchable A to Z list using Drupal. I'm
 also wondering what other institutions have done as far as in-house
 solutions. I know there're products we could buy, but, like everyone else,
 we don't have much money at the moment.

 Thanks for any info or advice!

 Michele DeSilva
 Central Oregon Community College Library
 Emerging Technologies Librarian
 541-383-7565
 mdesi...@cocc.edu

Re: [CODE4LIB] Do you have Project Gutenberg (or other public domain e-books) MARC Records in your OPAC?

2011-02-17 Thread Ross Singer

http://www.gutenberg.org/wiki/Main_Page

Project Gutenberg is the place where you can download over 33,000
free ebooks to read on your PC, iPad, Kindle, Sony Reader, iPhone,
Android or other portable device. 

Over 100,000 free ebooks are available through our Partners,
Affiliates and Resources. 

http://www.gutenberg.org/wiki/Gutenberg:Partners%2C_Affiliates_and_Resources

-Ross.

On Thu, Feb 17, 2011 at 12:35 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Hmm, what does ebook mean in this context exactly?

 Gutenberg has a heck of a lot more than 35k digital texts of books, I
 consider them all 'ebooks'. What does Gutenberg consider 'ebooks' exactly?

 On 2/17/2011 12:29 PM, Charles Ledvina wrote:

 Hello Matt:

 There are 35,224 records in this bzip file from Project Gutenberg:

 http://www.gutenberg.org/feeds/catalog.marc.bz2

 from this page:

 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs

 It is their complete eBook collection and they say the file is updated
 daily.

 --Charles Ledvina


 On Wed, 16 Feb 2011 17:03:58 -0500, Matt Amorymatt.am...@gmail.com
 wrote:

 If so can you send me a URL?

 Thanks much!
 Matt Amory

 On Wed, Feb 16, 2011 at 4:18 PM, Michele DeSilvamdesi...@cocc.edu

 wrote:

 Hi Code4Lib-ers,

 I want to chime in and say that I, too, enjoyed the streaming archive
 from
 the conference.

 I also have a question: my library has a horribly antiquated A to Z

 list

 of
 databases and online resources (it's based in Access). We'd like to do
 something that looks more modern and is far more user friendly. I found

 a

 great article in the Code4Lib journal (issue 12, by Danielle Rosenthal

 

 Mario Bernado) about building a searchable A to Z list using Drupal.

 I'm

 also wondering what other institutions have done as far as in-house
 solutions. I know there're products we could buy, but, like everyone
 else,
 we don't have much money at the moment.

 Thanks for any info or advice!

 Michele DeSilva
 Central Oregon Community College Library
 Emerging Technologies Librarian
 541-383-7565
 mdesi...@cocc.edu

Re: [CODE4LIB] Unexpected ruby-marc behavior

2011-01-27 Thread Ross Singer

No, that's expected behavior (and how it's always been).  You'd need
to do reader.rewind to put your enumerator cursor back to 0 to run
back over the records.

It's basically an IO object (since that's what it expects as input)
and behaves like one.

-Ross.

On Thu, Jan 27, 2011 at 2:03 PM, Cory Rockliff rockl...@bgc.bard.edu wrote:
 So I was taking ruby-marc out for a spin in irb, and encountered a bit of a 
 surprise. Running the following:

 require 'marc'
 reader = MARC::Reader.new('filename.mrc')
 reader.each {|record| puts record['245']}

 produces the expected result, but every subsequent call to reader.each 
 {|record| puts record['245']} returns nil.

 Am I missing something obvious? I don't remember this being the case before.

 Thanks!

 Cory

 [running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / 
 rubygems via homebrew]

Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-15 Thread Ross Singer

On Wed, Dec 15, 2010 at 10:20 AM, Karen Coyle li...@kcoyle.net wrote:

 Where I think we run into a problem is when we try to use FRBR as a record
 structure rather than conceptual guidance, which is what you allude to. This
 is the place where some implementations have decided to either merge Work
 and Expression or Expression and Manifestation because the Expression layer
 seems to make user displays more difficult. (I have also heard that the XC
 project found that putting the FRBR levels back together for display was
 inefficient.)

Right, and I think this leads to all sorts of other ugliness, too:
FRBR-izing aggregations of things (musical albums, anthologies,
conference proceedings, etc.) is a potential UX nightmare (from both
an end user AND data entry perspective).

That said, there are enormous benefits of modeling these things, too:
I am not suggesting we sweep them under the rug (which is basically
what we've historically done), but some we're going to need to figure
out an acceptable balance.

-Ross.

Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-14 Thread Ross Singer

On Tue, Dec 14, 2010 at 4:03 PM, McDonald, Stephen
steve.mcdon...@tufts.edu wrote:

 I couldn't really say, and I'm not sure that it matters.  Libraries have no 
 need to worry about Works which have no Manifestation, so in practice I don't 
 find it hard to recognize the Work-Manifestation relationship in the 
 materials we actually work with.

This is a pretty narrow view of what libraries need to worry about.
There are lots of Works that have no Manifestations, the antiquities
are littered with them (references to things that only existed in the
library of Alexandria, etc.).  Just because they're not on our shelf
(or any shelf) doesn't mean we shouldn't acknowledge them.

-Ross.

Re: [CODE4LIB] code4lib 2011: Hotel registration

2010-12-13 Thread Ross Singer

But... then I'd have to talk to a human being!

-Ross.

On Mon, Dec 13, 2010 at 1:06 PM, Ranti Junus ranti.ju...@gmail.com wrote:
 Folks, perhaps it'd be easier if you call the hotel instead, if the
 website doesn't work well. Please see the info from Andrew Darby
 below.


 thanks,
 ranti.

 -- Forwarded message --
 From: Andrew Darby ada...@ithaca.edu
 Date: Mon, Dec 13, 2010 at 12:59 PM
 Subject: Re: Trouble with registration site
 To: code4lib...@googlegroups.com


 I'd add that I had problems with the hotel online reservation . . .
 fill out enormous form, get a no rooms available message, redo
 everything, repeat 3 or 4 times without success.

 Much easier to call the hotel directly . . . they were very nice.

 Phone: 812-856-6381 (but they might redirect you to another #)




 --
 Bulk mail.  Postage paid.

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Ross Singer

Alex,

I think the problem is data like this:

http://lccn.loc.gov/96516389/marcxml

And while we can probably figure out a pattern to get the semantics
out this record, there is no telling how many other variations exist
within our collections.

So we've got lots of this data that is both hard to parse and,
frankly, hard to find (since it has practically zero machine readable
data in fields we actually use) and it needs to coexist with some
newer, semantically richer format.

What I'm saying is that the library's legacy data problem is almost to
the point of being existential.  This is certainly a detriment to
forward progress.

Analogously (although at a much smaller scale), my wife and I have
been trying for about 2 years to move our checking account from our
out of state bank to something local.  The problem is that we have
built up a lot of infrastructure around our old bank (direct deposit
and lots of automatic bill pay, etc.):  migration would not only be
time consuming, any mistakes made could potentially be quite expensive
and we have a lot of uncertainty of how long it would actually take to
migrate (and how that might affect the flow of payments, etc.).  It's
been, to date, easier for us just to drive across the state line
(despite the fact that it's way out of our way to anywhere) rather
than actually deal with it.  In the meantime, more direct bill pay
things have been set up and whatnot making our eventual migration that
much more difficult.

I do think it would be useful to figure out what exactly in our legacy
data is found only in libraries (that is, we could ditch this shoddy
The Last Waltz record and pull the data from LinkedMDB or Freebase
or somewhere) and determine the scale of the problem that only we can
address, but even just this environmental scan is a fairly large
undertaking.

-Ross.

On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen
alexander.johanne...@gmail.com wrote:
 On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
 Here, I think you're guilty of radically underestimating lots of people
 around the library world. No one thinks MARC is a good solution to
 our modern problems, and no one who actually knows what MARC
 is has trouble understanding MARC-XML as an XML serialization of
 the same old data -- certainly not anyone capable of meaningful
 contribution to work on an alternative.

 Slow down, Tex. Lots of people in the library world is not the same
 as developers, or even good developers, or even good XML developers,
 or even good XML developers who knows what the document model imposes
 to a data-centric approach.

 The problem we're dealing with is *hard*. Mind-numbingly hard.

 This is no justification for not doing things better. (And I'd love to
 know what the hard bits are; always interesting to hear from various
 people as to what they think are the *real* problems of library
 problems, as opposed to any other problem they have)

 The library world has several generations of infrastructure built
 around MARC (by which I mean AACR2), and devising data
 structures and standards that are a big enough improvement over
  MARC to warrant replacing all that infrastructure is an engineering
  and political nightmare.

 Political? For sure. Engineering? Not so much. This is just that whole
 blinded by MARC issue that keeps cropping up from time to time, and
 rightly so; it is truly a beast - at least the way we have come to
 know it through AACR2 and all its friends and its death-defying focus
 on all things bibliographic - that has paralyzed library innovation,
 probably to the point of making libraries almost irrelevant to the
 world.

 I'm happy to take potshots at the RDA stuff from the sidelines, but I never
 forget that I'm on the sidelines, and that the people active in the game are
 among the best and brightest we have to offer, working on a problem that
  invariably seems more intractable the deeper in you go.

 Well, that's a pretty scary sentence, for all sorts of reasons, but I
 think I shall not go there.

 If you think MARC-XML is some sort of an actual problem

 What, because you don't agree with me the problem doesn't exist? :)

 and that people
 just need to be shouted at to realize that and do something about it, then,
 well, I think you're just plain wrong.

 Fair enough, although you seem to be under the assumption that all of
 the stuff I'm saying is a figment of my imagination (I've been
 involved in several projects lambasted because managers think MARCXML
 is solving some imaginary problem; this is not bullshit, but pain and
 suffering from the battlefields of library development), that I'm not
 one of those developers (or one of you, although judging from this
 discussion it's clear that I am not), that the things I say somehow
 doesn't apply because you don't agree with, umm, what I'm assuming is
 my somewhat direct approach to stating my heretic opinions.


 Alex
 --
  Project Wrangler, SOA, Information Alchemist, UX,

Re: [CODE4LIB] Looking for OAuth experts

2010-10-14 Thread Ross Singer

On Thu, Oct 14, 2010 at 11:11 AM, MJ Ray m...@phonecoop.coop wrote:
 Ross Singer wrote:
 Unlike Twitter, however, we're starting from nothing.  There's nothing
 currently invested in ILS-DI clients that would break by committing
 solely to OAuth (or anything, for that matter).

 Are you sure there's nothing currently invested?  I thought the Koha
 community was already implementing ILS-DI so I assume there's some
 client using it, as people don't tend to fund useless developments.  I
 don't remember if any of the co-op's client libraries are using it
 yet, though.

I am pretty certain of this.  The current group is focusing on a
different set of functionality (primarily around borrower account
services) than the DLF group got to (which was about harvesting bib
records and limited item availability support).

In some ways, however, any answer I give here is correct.  The DLF
group provided no specifics on how to implement their functionality.
HarvestBibliographicRecords could be provided via OAI-PMH or Atom,
they provided a new XML format for including holdings availability,
but no specification on how it would be delivered, etc.

That is, the DLF ILS-DI provided guidelines of functionality that
needed to present, but not a specification on how it needed to operate
(they did give recommendations).

So any Koha implementation would just be an interpretation of these
guidelines, but there was no specification that anyone can point to to
say that an implementation is compliant.

 [ILS-DI]
 It's no longer under the auspices of the DLF and the priority of
 functionality has changed. [...]

 OK, if it's no longer under the auspices of the DLF are you still
 in contact with BibLibre?

They are more than welcome to participate.  It's not a closed process.

 Indeed, and I hope the reply was likewise helpful.

 It was.  More answers than questions, which is always good!

 That said, I'm still not seeing the benefits of OAuth for ILS-DI
 compared to existing HTTP authentication and authorization methods,
 really.

Ok, so let me provide you with a use case:

Imagine a vendor hosted discovery service (EBSCO Discovery Service,
Worldcat Local or Summon, for example, so we're not talking about any
sort of 'edge case').  To use HTTP authentication, one of the
following scenarios would need to be true:
  - they would either need to have access to a user's credentials
(which is a non-starter in many places)
  - they would need to be a trusted superuser of the ILS-DI API
service (they authenticate elsewhere, say an SSO, then can perform
lookups as anybody)
  - some kind of token based access would need to be established
between the discovery layer and the ILS-DI API

The last scenario is exactly what OAuth standardizes, so we're not
rolling our own, niche security protocol.

If you want another use case, imagine a service such as LibraryElf
(http://www.libraryelf.com/).  A protocol like OAuth would allow you
to share your borrower account with a (useful, I think!) service like
this *without* handing over your user credentials.

One can also imagine all sorts of interesting services cropping up in
places like LibraryThing about what you've currently got checked out,
placing holds on books recommended for you, etc.

HTTP Authentication/Authz pretty much assumes all services will be
provided locally which I think is a fairly antiquated assumption.

-Ross.
 Regards,
 --
 MJ Ray (slef), member of www.software.coop, a for-more-than-profit co-op.
 Past Koha Release Manager (2.0), LMS programmer, statistician, webmaster.
 In My Opinion Only: see http://mjr.towers.org.uk/email.html
 Available for hire for Koha work http://www.software.coop/products/koha

Re: [CODE4LIB] simple,flexible ILS for a small library.

2010-10-04 Thread Ross Singer

You know, with Jonathan's rephrasing (if it's accurate), it crossed my
mind that most ILSes that support course reserves should be able to
handle this.

It's extremely common for course reserves to belong to the instructor
that is putting them on reserve and the ILS would need to keep track
of that to return said materials back to the lending instructor.

Now, I have no idea if most reserves departments do this via notes in
the record or whatnot, but it might at least be a model for how this
could work with a traditional ILS.

Since I neither work for a library nor work for a vendor whose ILS
supports course reserves (since that model doesn't really exist in the
UK, apparently), I can't actually confirm how this works in practice.

But I'm guessing somebody on this list can.

-Ross.

On Mon, Oct 4, 2010 at 4:39 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Are any currently existing open source ILSs  flexible enough to support
 this
 model?

 I kind of doubt it.  What are you are doing sounds neat, but is not typical
 library workflow. Tell me if I'm re-describing what you're talking about
 correctly:  Every book in the library essentially belongs to one of the
 patrons. Patrons can both borrow books, and loan books to other patrons. The
 library is basically just a facilitator of patron-to-patron lending.  So
 you need to know what books are out that are owned by a certain patron, as
 well as what books are being borrowed by a certain patron. You need to know
 what books are over-due that are owned by a certain patron, etc.
 Creating a location, branch or collection code for each patron is
 going to be un-manageable with more than a few dozen patrons.
 I don't think most existing ILS systems -- open source or not -- are going
 to be set up to handle that system. On the other hand, many existing ILS
 systems are going to have all sorts of stuff you _don't_ need, like
 acquisitions, and serials tracking, and such.
 I wonder if you are better served looking for software that is NOT library
 software to handle the actual circulation.  Maybe there is some
 non-library software that is designed for a network of people lending stuff
 to each other?   And then you could always put a Solr-based discovery
 system on top of that for actual _finding_ of books available to be
 borrowed, perhaps using VuFind or Blacklight or rolling your own.  But the
 underlying tracking of circulation is actually the tricky part -- perhaps
 write your own custom software for that, if nothing open source can be
 found, but then export all items to a seperate Solr-based component for the
 actual search engine.
 Jonathan



 ... wrote:

 Reading my original post, perhaps I should have made the important point
 more clear.

 My question is about an ILS suitable for a library that does not own its
 books, but is borrowing those books from patrons.   The books all have
 lease
 end dates associated with them.  Book lenders are very similar to book
 borrowers, and they require end of day processing to see if any of the
 library's books are due back to them, in the same way borrower's books are
 due back to the library.

 So, in the last two posts which mentioned simple borrowing, that is what
 I
 am wanting, but for the library to be simply borrowing the books AND for
 patron to simply borrow those same books out of the library.

 Book lenders and book borrowers are essentially the same, except lenders
 first check a book in, and the due date is when the book leaves the
 library,
 and book borrowers check books out and then back in again.  Of course,
 many
 book borrowers are also lenders.

 Are any currently existing open source ILSs  flexible enough to support
 this
 model?

 Sorry for the confusion,
 Elliot

Re: [CODE4LIB] simple,flexible ILS for a small library.

2010-09-28 Thread Ross Singer

I think your functional requirement that made this non-trivial was
your mention of it needing ILL functionality.

There's a definite threshold that has to be crossed before you start
seeing something like that integrated into the ILS.

If you've got some other way to deal with ILL, I'd suggest OpenBiblio
(http://obiblio.sourceforge.net/) as a super simple, super basic ILS.
It deals with inventory, borrowers, circulation, etc. but nothing
terribly sophisticated.

You could use it with VuFind via the Jangle connector:
http://jangle.googlecode.com/svn/trunk/connectors/openbiblio/

-Ross.

On Mon, Sep 27, 2010 at 6:15 PM, ... offonoffoffon...@gmail.com wrote:
 Hello,

 Some folks in the VuFind library suggested I ask here.  We are starting a
 small library and thinking of using VuFind as our online catalog.  As for
 the ILS we would like something small and simple (evergreen and others seem
 massive for the small amount of functionality we need), and especially
 something which is flexible enough to allow us to base our library on book
 sharing rather than an institutionally owned collection.


 book sharing will probably happen through creative use of inter-library loan
 functionality, and so an ILS that has a solid and flexible ILL is necessary.

 We will probably have less than 1,000 books (perhaps a couple thousand if
 things really take off) and less than 100 borrowers.  Probably about as many
 book sharers (ie, partner libraries) as borrowers.

 Does anyone have experience with the ILS which already have drivers for
 VuFind?  I think the list is:

 * SirsiDynix Horizon
 * Sirsi Unicorn
 * Voyager:
 * VTLS Virtua
 * Innovative:
 * DAIA / OCLC PICA:
 * NewGenLib:

 If you can, please comment on the suitability of these ILS for our system
 (low complexity and flexible ILL system).

 I have considered just writing some administrative scripts in python.  It
 would be a good project and not ridiculously difficult.  I would much rather
 write the whole thing from scratch than try to write a VuFind driver for an
 ILS not yet supported.

 Thanks for reading,
 Elliot

Re: [CODE4LIB] Looking for OAuth experts

2010-09-20 Thread Ross Singer

On Mon, Sep 20, 2010 at 4:01 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can you give some details (or references) to justify the belief that OAuth
 isn't ready yet?  (The fact that Twitter implemented it poorly does not seem
 apropos to me, that's just a critique of Twitter, right?).

 I don't agree or disagree, just trying to take this from fud-ish rumor to
 facts to help me and others understand and make decisions.

Agreed on this assessment, Jonathan.  MJ, can you extrapolate on your
concerns, because that Ars Technica article is not going to cut it for
anything more than to avoid the choices that Twitter made.

And even by the standards of that article, I'm not sure that OAuth is
inappropriate for the ILS-DI's use cases which are:

1) server-to-server communication as the first priority
2) something relatively standardized and abstracted enough to allow
for institutions' local authentication mechanisms.

To quote from that article:

To be clear, I don't think that OAuth is a failure or a dead end. I
just don't think that it should be treated as an authentication
panacea to the detriment of other important security considerations.
What it comes down to is that OAuth 1.0a is a horrible solution to a
very difficult problem. It works acceptably well for server-to-server
authentication, but there are far too many unresolved issues in the
current specification for it to be used as-is on a widespread basis
for desktop applications. It's simply not mature enough yet.

Even in the context of server-to-server authentication, OAuth should
be viewed as a necessary evil rather than a good idea. It should be
approached with extreme trepidation and the high level of caution that
is warranted by such a convoluted and incomplete standard. Careless
adoption can lead to serious problems, like the issues caused by
Twitter's extremely poor implementation.

As I have written in the past, I think that OAuth 2.0—the next version
of the standard—will address many of the problems and will make it
safer and more suitable for adoption. The current IETF version of the
2.0 draft still requires a lot of work, however. It still doesn't
really provide guidance on how to handle consumer secret keys for
desktop applications, for example. In light of the heavy involvement
in the draft process by Facebook's David Recordon, I'm really hopeful
that the official standard will adopt Facebook's sane and reasonable
approach to that problem.

Which basically spells out the problem the ILS-DI group is facing:  an
incomplete, but evolving standard with heavy industry support, or...
nothing.

We are still very much in the fact-gathering stage, so any suggestions
are welcome.  At the glacial pace of library development, I think it's
safe to assume OAuth 2.0 will be less of a moving target by any
implementation stage.

-Ross.

Re: [CODE4LIB] Looking for OAuth experts

2010-09-20 Thread Ross Singer

On Mon, Sep 20, 2010 at 5:21 PM, MJ Ray m...@phonecoop.coop wrote:
 Ross Singer wrote:
 Agreed on this assessment, Jonathan.  MJ, can you extrapolate on your
 concerns, because that Ars Technica article is not going to cut it for
 anything more than to avoid the choices that Twitter made.

 I've just sent another message trying to do that.  Hope it helps.

Yes.  Well, at any rate it helps me refine my problem statement some.

The concern about distributed apps (while legitimate) doesn't worry me
quite so much in this particular case.  The main use case we would be
looking to solve is for known applications to access (and, depending
on how trusted they are, manipulate) confidential (again, according to
the level of trust) user information in an ILS without needing to
store their credentials.

If there is an added bonus of being to use it for all sorts of other,
distributed applications, so much the better, but if that's not viable
or secure, it's no problem since it would be outside of the necessary
requirements, anyway.

 And even by the standards of that article, I'm not sure that OAuth is
 inappropriate for the ILS-DI's use cases which are:

 1) server-to-server communication as the first priority
 2) something relatively standardized and abstracted enough to allow
 for institutions' local authentication mechanisms.

 I think FOSS servers would be affected by the published-key spoofing
 flaw too, wouldn't they?

There are open source OAuth server implementations out there, I assume
there's some local salt-ing going on.

Another key difference between ILS-DI's use case and a service like
Twitter's is that, from the start, the expectation can be set that
only whitelisted clients have access.  I'm not sure if Johns Hopkins
or Stanford or NYPL cares much if there's a teeming app marketplace
that can be built on top of their ILS API as much as simple and
consistent access from their discovery interfaces, courseware,
electronic reserves application, etc..

The very attributes that may make OAuth questionable for services like
Twitter, Facebook, and their ilk may be non-factors for an ILS API
simply because the environment can be much more controlled.

The problem would be that if, indeed, these flaws do undermine public
support for OAuth, the advantages it brings (client/server libraries,
awareness outside of very library-specific domains) would be lost if
there's no community using it.

 Some of the projects that want to support ILS-DI are FOSS - one of the
 Koha support companies signed some ILS-DI announcement IIRC, while
 another wrote some of the code to implement it.

 Which basically spells out the problem the ILS-DI group is facing:  an
 incomplete, but evolving standard with heavy industry support, or...
 nothing.

 Glad to see it's recognised that OAuth is incomplete.

Really all that's recognized is that it exists and is one of the only,
if not the only, protocol that allows for the decentralization of
auth/authz without the client service needing to manage personal
credentials.

That's not necessarily an ILS-DI requirement, but it sure would be
useful if we had it.

 I've heard as much opposition as support among developers.  On the one
 hand, it's more work to sell.  On the other, they're now even more at
 the mercy of big service providers who can break their applications
 (and so eat their support budgets) at will.


Unlike Twitter, however, we're starting from nothing.  There's nothing
currently invested in ILS-DI clients that would break by committing
solely to OAuth (or anything, for that matter).

If there is broad language support to build clients and servers, this
should be less of an issue.

 We are still very much in the fact-gathering stage, so any suggestions
 are welcome.  [...]

 If the problem that the group is trying to solve was explained on this
 list, readers might be able to offer suggestions.

Jonathan gave a pretty good summary, but I'll tack on.

The ILS-DI initiative was initially proposed by the digital library
federation to provide following functionality out of integrated
library systems:

Level 1: Basic Discovery Interfaces
 * HarvestBibliographicRecords
 * HarvestExpandedRecords
 * GetAvailability
 * GoToBibliographicRequestPage

Level 2: Elementary OPAC supplement
All of the above, plus
 * HarvestAuthorityRecords
 * HarvestHoldingsRecords
 * GetRecord
 * Search
 * Scan
 * GetAuthorityRecords
 * Either OutputRewritablePage or OutputIntermediateFormat

Level 3: Elementary OPAC alternative
All of the above, plus
 * LookupPatron
 * AuthenticatePatron
 * GetPatronInfo
 * GetPatronStatus
 * GetServices
 * RenewLoan
 * HoldTitle
 * HoldItem
 * CancelHold
 * RecallItem
 * CancelRecall

Level 4: Robust/domain specific discovery platforms
All of the above, plus
 * SearchCourseReserves
 * Explain
 * Both OutputRewritablePage and OutputIntermediateFormat

It's no longer under the auspices of the DLF and the priority of
functionality has changed.  We're now focused first

Re: [CODE4LIB] content type for rdf

2010-08-20 Thread Ross Singer

It depends on how you're serving your RDF.

RDF/XML is application/rdf+xml
N3 is text/n3;charset=utf-8
Turtle is text/turtle
NTriples are text/plain

-Ross.

On Fri, Aug 20, 2010 at 7:03 AM, Eric Lease Morgan emor...@nd.edu wrote:
 I am in the process of creating sets of cool URLs, and I need to know the 
 best (correct) content type of RDF. Is it application/rdf+xml?

 Similarly, is the correct content type for HTML equal to text/html?

 --
 Eric Morgan

Re: [CODE4LIB] open source proxy packages?

2010-08-15 Thread Ross Singer

On Sun, Aug 15, 2010 at 8:10 PM, Cary Gordon listu...@chillco.com wrote:
 In my experience, I haven't found anything that is as easy to use (or
 even close) to EZProxy. Unless you value your time at under $5/hr., or
 are a FLOSS zealot (I think of myself as a semi-zealot), it is a
 bargain at $500. A significant part of that bargain is the great
 community that supports it.

+1

I was trying to figure out how to word this in a way that wasn't
discouraging or too EZProxy fanboy-ish, but I honestly could not see
an alternative that, in the end, would be nearly as cost effective as
EZProxy.

EZProxy's (lifetime!) price, tiny footprint and support are going to
be hard to beat.

Art Rhyno once attributed Shibboleth's lack of uptake on EZProxy: why
bother with this ultra-complicated authentication/authorization
mechanism when EZProxy just works, on unlimited machines, with
unlimited upgrades, for $500?

-Ross.
 This is not to say, or course, that it is not possible to do this with
 Squid, which was the first effective solution, or other tools.

 Cary

 On Sat, Aug 14, 2010 at 10:05 AM, phoebe ayers phoebe.w...@gmail.com wrote:
 Hello all,

 Are there any open source proxies for libraries that have been
 developed, e.g. an open source alternative to EZProxy or similar? I'm
 working with a non-profit tech foundation that is interested in
 granting access to a few licensed resources to a few hundred people
 who are scattered around the world.

 thanks,
 Phoebe

 --
 * I use this address for lists; send personal messages to phoebe.ayers
 at gmail.com *




 --
 Cary Gordon
 The Cherry Hill Company
 http://chillco.com

Re: [CODE4LIB] schema for some web page

2010-07-08 Thread Ross Singer

http://dublincore.org/documents/dcmi-terms/#terms-relation

This term is intended to be used with non-literal values as defined
in the DCMI Abstract Model
(http://dublincore.org/documents/abstract-model/). As of December
2007, the DCMI Usage Board is seeking a way to express this intention
with a formal range declaration.

So if you use the dcterms namespace (rather than dc elements) you
should be fine.

-Ross.

On Thu, Jul 8, 2010 at 9:48 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 In my experience, you can't tell much about what you'd really want to know
 for user needs from the indicators or subfield 3's, at least in my catalog.
 FRBR relationships probably don't work because the destination of an
 arbitrary 856 is not neccesarily a FRBR entity, and even if it is there's no
 way to know that (or what class of entity) from the data.  It really is just
 generic some kind of related web page.

 So dc:relation does sound like the right vocabulary element for generic
 related web page page, thanks.  Is the value of dc:relation _neccesarily_
 a URI/URL?  I hope so, because otherwise I'm not sure dc:relation is
 sufficient, as I really do need something that says some related URL.

 Thanks for the advice,

 Jonathan

 Ed Summers wrote:

 On Wed, Jul 7, 2010 at 7:00 PM, Doran, Michael D do...@uta.edu wrote:


 Of course, subfield $3 values are not any kind of controlled vocabulary,
 so it's hard to do much with them programmatically.


 A few years ago I analyzed the subfield 3 values in the Library of
 Congress data up at the Internet Archive [1]. Of course it's really
 simple to extract, but I just pushed it up to GitHub, mainly to share
 the results [2].

 I extracted all the subfield 3 values from the 12M? records, and then
 counted them up to see how often they repeated [3]. As you can see
 it's hardly controlled, but it might be worthwhile coming up with some
 simple heuristics and properties for the familiar ones: you could
 imagine dcterms:description being used for Publisher description,
 etc.

 Of course the $3 in your catalog data might be different from LCs, but
 maybe we could come up with a list of common ones on a wiki somewhere,
 and publish a little vocabulary that covered the important relations?

 //Ed

 [1] http://www.archive.org/details/marc_records_scriblio_net
 [2] http://github.com/edsu/beat
 [3] http://github.com/edsu/beat/raw/master/types.txt

[CODE4LIB] MARC Codes for Forms of Musical Composition

2010-07-01 Thread Ross Singer

Hi everybody,

I just wanted to let people know I've made the MARC codes for forms of
musical compositions
(http://www.loc.gov/standards/valuelist/marcmuscomp.html) available as
http://purl.org/ontology/mo/Genres.

http://purl.org/NET/marccodes/muscomp/

They follow the same naming convention as they would in the MARC 008
or 047, so it's easy to map (that is, no lookup needed) from your MARC
data:

http://purl.org/NET/marccodes/muscomp/sy#genre

etc.

The RDF is available as well:
http://purl.org/NET/marccodes/muscomp/sy.rdf


I'd love any feedback/suggestions/corrections/etc.

Also, you can look around to see MARC country codes, geographic area
codes and language codes.  Eventually I would like to get all of the
MARC codes (not already modeled by LC) in there
(http://www.loc.gov/standards/valuelist/).

Thanks,
-Ross.

Re: [CODE4LIB] Planet Code4Lib RSS feed

2010-06-28 Thread Ross Singer

There seems like there's a bad entry in there:

http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fplanet.code4lib.org%2Fatom.xml

from the C4L Journal's feed which may be screwing up the aggregated
feed as a whole.

-Ross.

On Mon, Jun 28, 2010 at 11:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Code4libbers, anyone want to help out debugging this?  I'm kind of the
 'steward' of the planet code4lib, but haven't really spent much time with it
 understanding it technically, and won't really have any time to look at it
 for a while, I'm kind of swamped at work.
 Jonathan

 Steve Casburn wrote:

 Jonathan,

 Eric Lease Morgan suggested that you might be the right person to
 report this to...

 The RSS feed (in all three flavors) for Planet Code4Lib seems to have
 been down since June 12.  I have received no new posts during that
 time in my newsreader (NewsFire for Mac OS X), and the webpage for
 each version of the RSS feed are blank.


 Steve

Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Ross Singer

On Tue, May 4, 2010 at 7:55 AM, Mike Taylor m...@indexdata.com wrote:
 Having read the rest of this thread, I find that nothing that's been
 said changes my initial gut reaction on reading this question: DO NOT
 USE DCTERMS.  It's vocabulary is Just Plain Inadequate, and not only
 for esoteric cases like the Alternative Chronological Designation of
 First Issue or Part of Sequence field that Karen mentioned.  Despite
 having 70 (seventy!) elements, it's lacking fundamental fields for
 describing articles in journals -- there are no journalTitle, volume,
 issue, startPage or endPage fields.  That, for me, is a deal-breaker.

If you're using Dublin Core as XML, I agree with this.  If you're
using Dublin Core as RDF (which is, honestly, the only thing it's
really good for), this is a non-issue.

-Ross.

Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Ross Singer

On Tue, May 4, 2010 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:
 Ross, I think that got mangled in the sending -- either that, or it's
 some strange format that I've never seen before.  That said, I am
 tremendously impressed by all the information you obtained there.
 What software did you use, how much of this did you have to feed it by
 hand, and how much did it intuit from existing structured datasets?

Oh, that's probably not mangled, that's probably just how Turtle looks
:)  I'll also send it as RDF/XML.

That graph was compiled by a Google Scholar search on Mike Taylor
dinosaur, the Ingenta page describing your article, a text editor
(TextMate) and 30 minutes of my life I'll never get back.

Ok, here's the graph as RDF/XML:

?xml version=1.0 encoding=utf-8?
rdf:RDF
   xmlns:bibo=http://purl.org/ontology/bibo/;
   xmlns:dcterms=http://purl.org/dc/terms/;
   xmlns:foaf=http://xmlns.com/foaf/0.1/;
   xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
   xmlns:xsd=http://www.w3.org/2001/XMLSchema#integer;
  bibo:AcademicArticle rdf:nodeID=article1
dcterms:abstract xml:lang=enXenoposeidon proneneukos gen. et
sp. nov. is a neosauropod represented by BMNH R2095, a well-preserved
partial mid-to-posterior dorsal vertebra from the
Berriasian-Valanginian Hastings Beds Group of Ecclesbourne Glen, East
Sussex, England. It was briefly described by Lydekker in 1893, but it
has subsequently been overlooked. This specimen's concave cotyle,
large lateral pneumatic fossae, complex system of bony laminae and
camerate internal structure show that it represents a neosauropod
dinosaur. However, it differs from all other sauropods in the form of
its neural arch, which is taller than the centrum, covers the entire
dorsal surface of the centrum, has its posterior margin continuous
with that of the cotyle, and slopes forward at 35 degrees relative to
the vertical. Also unique is a broad, flat area of featureless bone on
the lateral face of the arch; the accessory infraparapophyseal and
postzygapophyseal laminae which meet in a V; and the asymmetric neural
canal, small and round posteriorly but large and teardrop-shaped
anteriorly, bounded by arched supporting laminae. The specimen cannot
be referred to any known sauropod genus, and clearly represents a new
genus and possibly a new `family'. Other sauropod remains from the
Hastings Beds Group represent basal Titanosauriformes, Titanosauria
and Diplodocidae; X. proneneukos may bring to four the number of
sauropod `families' represented in this unit. Sauropods may in general
have been much less morphologically conservative than is usually
assumed. Since neurocentral fusion is complete in R2095, it is
probably from a mature or nearly mature animal. Nevertheless, size
comparisons of R2095 with corresponding vertebrae in the Brachiosaurus
brancai holotype HMN SII and Diplodocus carnegii holotype CM 84
suggest a rather small sauropod: perhaps 15 m long and 7600 kg in mass
if built like a brachiosaurid, or 20 m and 2800 kg if built like a
diplodocid./dcterms:abstract
dcterms:creator rdf:resource=_:author1/
dcterms:creator rdf:resource=_:author2/
dcterms:isPartOf rdf:resource=_:journal1/
dcterms:issued
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerdate;2007-11/dcterms:issued
dcterms:language
rdf:resource=http://purl.org/NET/marccodes/languages/eng#lang/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85038094#concept/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85097127#concept/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85117730#concept/
dcterms:title xml:lang=enAN UNUSUAL NEW NEOSAUROPOD DINOSAUR
FROM THE LOWER CRETACEOUS HASTINGS BEDS GROUP OF EAST SUSSEX,
ENGLAND/dcterms:title
bibo:authorList
  rdf:Description
rdf:first rdf:resource=_:author1/
rdf:rest
  rdf:Description
rdf:first rdf:resource=_:author2/
rdf:rest
rdf:resource=http://www.w3.org/1999/02/22-rdf-syntax-ns#nil/
  /rdf:Description
/rdf:rest
  /rdf:Description
/bibo:authorList
bibo:doi10./j.1475-4983.2007.00728.x/bibo:doi
bibo:issue
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;6/bibo:issue
bibo:numPages
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;18/bibo:numPages
bibo:pageEnd
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1564/bibo:pageEnd
bibo:pageStart
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1547/bibo:pageStart
bibo:pages1547-1564/bibo:pages
bibo:volume
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;50/bibo:volume
  /bibo:AcademicArticle
  bibo:Journal rdf:nodeID=journal1
dcterms:publisher rdf:resource=_:publisher1/
dcterms:titlePalaeontology/dcterms:title
bibo:issn0031-0239/bibo:issn
foaf:homepage
rdf:resource=http://www3.interscience.wiley.com/journal/118531917/home?CRETRY=1amp;SRETRY=0/
  /bibo:Journal

Re: [CODE4LIB] It's cool to love milk and cookies

2010-05-03 Thread Ross Singer

But is there a NISO standard for this?

On Fri, Apr 30, 2010 at 7:13 PM, Simon Spero s...@unc.edu wrote:
 I like chocolate milk.

Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Ross Singer

Out of curiosity, what is your use case for turning this into DC?
That might help those of us that are struggling to figure out where to
start with trying to help you with an answer.

-Ross.

On Mon, May 3, 2010 at 11:46 AM, MJ Suhonos m...@suhonos.ca wrote:
 Thanks for your comments, guys.  I was beginning to think the lack of 
 response indicated that I'd asked something either heretical or painfully 
 obvious.  :-)

 That's my understanding as well. oai_dc predates the defining of the 15 
 legacy DC properties in the dcterms namespace, and it's my guess nobody saw 
 a reason to update the oai_dc definition after this happened.

 This is at least part of my use case — we do a lot of work with OAI on both 
 ends, and oai_dc is pretty limited due to the original 15 elements.  My 
 thinking at this point is that there's no reason we couldn't define something 
 like oai_dcterms and use the full QDC set based on the updated profile.  
 Right?

 FWIW, I'm not limited to any legacy ties; in fact, my project is aimed at 
 pushing the newer, DC-sanctioned ideas forward, so I suspect in my case 
 using an XML serialization that validates against http://purl.org/dc/terms/ 
 is probably sufficient (whether that's RDF or not doesn't matter at this 
 point).

 So, back to the other part of the question:  has anybody seen a MODS — 
 DCTERMS crosswalk in the wild?  It looks like there's a lot of similarity 
 between the two, but before I go too deep down that rabbit hole, I'd like to 
 make sure someone else hasn't already experienced that, erm, joy.

 MJ

Re: [CODE4LIB] it's cool to hate on OpenURL

2010-04-30 Thread Ross Singer

On Fri, Apr 30, 2010 at 4:09 AM, Jakob Voss jakob.v...@gbv.de wrote:

 Am I right that neither OpenURL nor COinS strictly defines a metadata model
 with a set of entities/attributes/fields/you-name-it and their definition?
 Apparently all ContextObjects metadata formats are based on non-normative
 implementation guidelines only ??

You are right.  Z39.88 and (by extension) COinS really only defines
the ContextObject itself.  So it defines the carrier package, it's
administrative elements, referents, referrers, referringentities,
services, requester and resolver and their transports.

It doesn't really specify what should actually go into any of those
slots.  The idea is that it defers to the community profiles for that.

In the XML context object, you can send more than one metadata-by-val
element (or metadata-by-ref) per entity (ref, rfr, rfe, svc, req, res)
- I'm not sure what is supposed to happen, for example, if you send a
referent that has multiple MBV elements that don't actually describe
the same thing.

-Ross.

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer

On Fri, Apr 30, 2010 at 7:59 AM, Kyle Banerjee kyle.baner...@gmail.com wrote:

 An obvious thing for a resolver to be able to do is return results in JSON
 so the OpenURL can be more than a static link. But since the standard
 defines no such response, the site generating the OpenURL would have to know
 something about the resolver.

I actually think this lack of any specified response format is a large
factor in the stagnation of OpenURL as a technology.  Since a resolver
is under no obligation to do anything but present a web page it's
difficult for local entrepreneurial types to build upon the
infrastructure simply because there are no guarantees that it will
work anywhere else (or even locally, depending on your vendor, I
suppose), much less contribute back to the ecosystem.

Umlaut was able to exist because (for better or worse) SFX has an XML
output.  It has never been able to scale horizontally, however,
because to work with another vendor's link resolver (which should
actually be quite straightforward) it requires a connector to whatever
*their* proprietary API needs.

I could definitely see a project like Umlaut providing a 'de facto'
machine readable response for SAP 1/2 requests that content providers
could then use to start offering better integration at *their* end.

This assumes that more than 5 libraries would actually be using it, however.

-Ross.

Re: [CODE4LIB] Twitter annotations and library software

2010-04-29 Thread Ross Singer

On Thu, Apr 29, 2010 at 8:17 AM, MJ Suhonos m...@suhonos.ca wrote:
 Okay, I know it's cool to hate on OpenURL, but I feel I have to clarify a few 
 points:


It's not that it's cool to hate on OpenURL, but if you've really
worked with it it's easy to grow bitter.

snip
 Maybe if I put it that way, OpenURL sounds a little less crappy.

No, OpenURL is still crappy and it will always be crappy, I'm afraid,
because it's tremendously complicated, mainly from the fact that it
tries to do too much.

The reason that context-sensitive services based on bibliographic
citations comprise 99% of all OpenURL activity is because:
A) that was the problem it was originally designed to solve
B) it's the only thing it really does well (and OpenURL 1.0's
insistence on being able to solve any problem almost takes that
strength away from it)

The barriers to entry + the complexity of implementation almost
guarantee that there's a better or, at any rate, easier alternative to
any problem.

The difference between OpenURL and DublinCore is that the RDF
community picked up on DC because it was simple and did exactly what
they needed (and nothing more).  A better analogy would be Z39.50 or
SRU: two non-library-specific protocols that, for their own reasons,
haven't seen much uptake outside of the library community.

-Ross.

Re: [CODE4LIB] Twitter annotations and library software

2010-04-29 Thread Ross Singer

On Thu, Apr 29, 2010 at 10:32 AM, Rosalyn Metz rosalynm...@gmail.com wrote:
 I'm going to throw in my two cents.

 I dont think (and correct me if i'm wrong) we have mentioned once what
 a user might actually put in a twitter annotation.  a book title?  an
 article title? a link?

I think the idea is these would be machine generated from an
application.  So, imagine LT, Amazon, Delicious Library or SFX having
a Tweet this! button and *that* provides the annotation (not the
user).

 i think creating some super complicated thing for a twitter annotation
 dooms it to failure.  after all, its twitter...make it short and
 sweet.

Indeed, it's limited.

 also the 1.0 document for OpenURL isn't really that bad (yes I have
 read it).  a good portion of it is a chart with the different metadata
 elements.  also open url could conceivably refer to an animal and then
 link to a bunch of resources on that animal, but no one has done that.
  i don't think that's a problem with OpenURL i think thats a problem
 with the metadata sent by vendors to link resolvers and librarians
 lack of creativity (yes i did make a ridiculous generalization that
 was not intended to offend anyone but inevitably it will).  having
 been a vendor who has worked with openurl, i know that the informaiton
 databases send seriously effects (affects?) what you can actually do
 in a link resolver.

No, this is the mythical promise of 1.0, but delivery is, frankly,
much more complicated than that.  It is impractical to expect an
OpenURL link resolver to make sense of any old thing you throw at it
and return sensible results.  This is the point of the community
profiles, to narrow the infinite possibilities a bit.  None of our
current profiles would support the scenario you speak of and I would
be surprised if such a service were to be devised, that it would be
built on OpenURL.

I think it's very easy to underestimate how complicated it is to
actually build something using OpenURL since in the abstract it seems
like a very logical solution to any problem.

-Ross.




 On Thu, Apr 29, 2010 at 10:23 AM, Tim Spalding t...@librarything.com wrote:
 Can we just hold a vote or something?

 I'm happy to do whatever the community here wants and will actually
 use. I want to do something that will be usable by others. I also
 favor something dead simple, so it will be implemented. If we don't
 reach some sort of conclusion, this is an interesting waste of time. I
 propose only people engaged in doing something along these lines get
 to vote?

 Tim

Re: [CODE4LIB] Twitter annotations and library software

2010-04-29 Thread Ross Singer

On Thu, Apr 29, 2010 at 11:21 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 (Last
 time I looked at Bibo, I recall there was no place to put a standard
 identifier like a DOI.  So maybe using Bibo + URI for standard identifier
 would suffice. etc.)

BIBO has all sorts of identifiers (including DOI):

http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/dataproperties/doi___1125128004.html

As well as ISBN (10 and 13), ISSN/e-issn, LCCN, EAN, OCLCNUM, and more.

-Ross.

Re: [CODE4LIB] Twitter annotations and library software

2010-04-29 Thread Ross Singer

I still don't really see how what you're talking about would
practically be accomplished.

For one, to have rft.subject, like you mention, would require using
the dublincore context set.  Since that wouldn't be useful on its own
for the services that link resolvers currently offer, OpenURL sources
(i.e. AI database providers) would have to support SAP 2 (XML)
context objects so they can pass the book/journal/patent/etc. referent
metadata along with the Dublin Core referent metadata.  It also
becomes a POST rather than a simple link (GET).

What I'm saying is it ups the requirements on all ends of the
ecosystem, for what?

What you're talking about would be *much* more easily implemented via
SRU and CQL (or OpenSearch), anyway, since your example is really
performing a search.  Since OpenURL doesn't have any semblance of
standardized response format, a client wouldn't know what to do with
the response, anyway.

-Ross.

On Thu, Apr 29, 2010 at 11:29 AM, Rosalyn Metz rosalynm...@gmail.com wrote:
 ok right now exlibris has a recommender service for sfx that stores
 metadata from an openurl.  lets say a vendor bothered to pass an
 element like rft.subject=hippo (which is most likely unlikely to
 happen since they can't even pass an issn half the time).  that
 subject got stored in the recommender service.

 next time a child saw something in ebsco animals about hippos they
 could click the find this button (or whatever it says) and the
 recommender service could bring up everything on hippos.  the openurl
 that would be passed would be something like
 http://your.linkresolver.com/name?rft.subject=hippo

 yes this is simplistic, but its more creative then say doing something
 boring like just bringing up the full text or doing something half ass
 creative like bringing up articles that are cited in the footnotes.
 and to say something like rft.subject (or whatever it might be called)
 is out of the scope of group profiles is a little absurd since we are
 talking about things that already have subjects attached to them (see
 any database or other library related system).

 of course you'll probably want to talk about next how subjects aren't
 standardized and that makes it possible.  that is true, but that isn't
 openurl's fault or the link resolvers fault, thats the database
 vendors who refuse to get with the program.






 On Thu, Apr 29, 2010 at 11:02 AM, Ross Singer rossfsin...@gmail.com wrote:
 On Thu, Apr 29, 2010 at 10:32 AM, Rosalyn Metz rosalynm...@gmail.com wrote:
 I'm going to throw in my two cents.

 I dont think (and correct me if i'm wrong) we have mentioned once what
 a user might actually put in a twitter annotation.  a book title?  an
 article title? a link?

 I think the idea is these would be machine generated from an
 application.  So, imagine LT, Amazon, Delicious Library or SFX having
 a Tweet this! button and *that* provides the annotation (not the
 user).

 i think creating some super complicated thing for a twitter annotation
 dooms it to failure.  after all, its twitter...make it short and
 sweet.

 Indeed, it's limited.

 also the 1.0 document for OpenURL isn't really that bad (yes I have
 read it).  a good portion of it is a chart with the different metadata
 elements.  also open url could conceivably refer to an animal and then
 link to a bunch of resources on that animal, but no one has done that.
  i don't think that's a problem with OpenURL i think thats a problem
 with the metadata sent by vendors to link resolvers and librarians
 lack of creativity (yes i did make a ridiculous generalization that
 was not intended to offend anyone but inevitably it will).  having
 been a vendor who has worked with openurl, i know that the informaiton
 databases send seriously effects (affects?) what you can actually do
 in a link resolver.

 No, this is the mythical promise of 1.0, but delivery is, frankly,
 much more complicated than that.  It is impractical to expect an
 OpenURL link resolver to make sense of any old thing you throw at it
 and return sensible results.  This is the point of the community
 profiles, to narrow the infinite possibilities a bit.  None of our
 current profiles would support the scenario you speak of and I would
 be surprised if such a service were to be devised, that it would be
 built on OpenURL.

 I think it's very easy to underestimate how complicated it is to
 actually build something using OpenURL since in the abstract it seems
 like a very logical solution to any problem.

 -Ross.




 On Thu, Apr 29, 2010 at 10:23 AM, Tim Spalding t...@librarything.com 
 wrote:
 Can we just hold a vote or something?

 I'm happy to do whatever the community here wants and will actually
 use. I want to do something that will be usable by others. I also
 favor something dead simple, so it will be implemented. If we don't
 reach some sort of conclusion, this is an interesting waste of time. I
 propose only people engaged in doing something along these lines get
 to vote?

 Tim

Re: [CODE4LIB] Microsoft Zentity

2010-04-28 Thread Ross Singer

On Wed, Apr 28, 2010 at 10:21 AM, Houghton,Andrew hough...@oclc.org wrote:
 If its open source, I assume that it could be adapted to run under Mono and 
 then you could run it on Linux, Macs, etc.  It may even run under Mono, don't 
 know, haven't played with it.


Well, it requires SQLServer, so I think this is probably going to be
much more difficult than it's worth.

-Ross.

Re: [CODE4LIB] Microsoft Zentity

2010-04-28 Thread Ross Singer

On Wed, Apr 28, 2010 at 10:17 AM, Ethan Gruber ewg4x...@gmail.com wrote:
 It seems to me that the major flaw of the software is that it isn't
 cross-platform, which comes as no surprise.  But I feel Microsoft didn't do
 their market research.  While the financial and business sectors are heavily
 reliant on Microsoft servers, American universities, and by extension,
 research libraries, are not.  If they really wanted to make a commitment to
 support the academic community as they say on the Zentity website, they
 would have developed it for a platform that the academic community actually
 uses.

This seems like sort of a snotty answer, honestly, and I find three
flaws with it:

1) Research and intellectual output is not exclusive to large,
research university which means repositories should not be exclusive
to ARL libraries
2) There are lots of academic Microsoft shops, esp. at the campus IT
(or departmental IT) level.  It's not beyond reason to think that a
smaller university would prefer the repository be hosted by central IT
(or that the chemistry department or engineering school in a larger
university host their own repository).
3) E-Prints, for example, seems to be making an effort to commodotize
and democratize the repository space a bit by making it as simple as
possible to run an IR.  MS is making this even simpler for places that
already have Windows servers (which is a lot).

There are plenty of reasons to criticize Microsoft, but I just don't
see how Zentity is one of them.

-Ross.

Re: [CODE4LIB] Twitter annotations and library software

2010-04-27 Thread Ross Singer

On Tue, Apr 27, 2010 at 7:02 AM, Jakob Voss jakob.v...@gbv.de wrote:

 The purpose of description can best be served by a format that can easily be
 displayed for human beeings. You can either use a simple string or a
 well-known format. A string can be displayed but people will put all
 different citation formats in there. Right now there are only two
 established metadata formats that aim at creating a citation:

 a) BibTeX
 b) The input format of the Citation Style Language (CSL)

This isn't entirely true.  There's RIS
(http://en.wikipedia.org/wiki/RIS_%28file_format%29) and BIBO
(http://bibliontology.com/) is starting to become quite common in the
linked data sphere.

There's also BibJSON (http://www.bibkn.org/bibjson/index.html) which
I've had a browser tab open for months with the intention of actually
looking at and actually seems quite well suited for how Twitter will
store annotations.  My opinion of it all along, however, has been very
similar to yours -- why another citation format and why bind it so
closely to a particular serialization?

-Ross.

Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer

The advantage of the NoSQL DBs is that they're schema-less which
allows much more flexibility in your data going in.

However, it sounds like your schema may be pretty standardized -- I'm
not sure of a huge advantage (outside the aforementioned replication
functionality) you'd get.

-Ross.

On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote:
 So let's say (hypothetically, of course) that a colleague tells you he's
 considering a NoSQL database like MongoDB or CouchDB, to store a couple
 tens of millions of documents, where a document is pretty much an
 article citation, abstract, and the location of full text (not the full
 text itself).  Would your reaction be:

 That's a sensible, forward-looking approach.  Lots of sites are putting
 lots of data into these databases and they'll only get better.

 This guy's on the bleeding edge.  Personally, I'd hold off, but it could
 work.

 Schedule that 2012 re-migration to Oracle or Postgres now.

 Bwahahahah!!!

 Or something else?



 (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.)


 --
 Thomas Dowling
 tdowl...@ohiolink.edu

Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Ross Singer

On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The thing is, the NoSQL stuff is pretty much just a key-value store.
  There's generally no way to query the store, instead you can simply look
 up a document by ID.

Actually, this depends largely on the NoSQL DBMS in question.  Some
are key value stores (Redis, Tokyo Cabinet, Cassandra), some are
document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I
think blanket statements like this are somewhat misleading.

CouchDB and MongoDB (for example) have the capacity to index the
values within the document - you don't just have to look up things by
document ID.

-Ross.

Re: [CODE4LIB] OpenURL aggregator not doing so well

2010-04-09 Thread Ross Singer

Yes, although, the problem is actually with Connotea:
http://www.connotea.org/article/4c40adbf8ecaef53b3772b5a141e229d

So we either need to talk to NPG or drop Connotea from the OpenURL planet.

-Ross.

On Fri, Apr 9, 2010 at 8:00 AM, Eric Hellman e...@hellman.net wrote:
 Take a look at
 http://openurl.code4lib.org/aggregator
 Any ideas how to make it work better?

 Eric Hellman
 President, Gluejar, Inc.
 41 Watchung Plaza, #132
 Montclair, NJ 07042
 USA

 e...@hellman.net
 http://go-to-hellman.blogspot.com/

Re: [CODE4LIB] local c4l chatter and the listserv

2010-04-09 Thread Ross Singer

Or if their regional mailing list could send the main list a digest email.

Or something.

-Ross.

On Fri, Apr 9, 2010 at 6:30 PM, Frumkin, Jeremy
frumk...@u.library.arizona.edu wrote:
 And by 'end case' of course I meant 'edge case'.

 -- jaf


 On Apr 9, 2010, at 3:26 PM, Frumkin, Jeremy wrote:

 Seems a bit complex to me. I'd be happy if people just remembered to announce 
 things on the main list, such as we're holding this here event, and/or if 
 you are interested in this event, sign up on this related discussion list. 
 I'm not a big fan of architecting to an end case, and it feels like that's 
 what this is.


 -- jaf


 On Apr 9, 2010, at 3:08 PM, Jonathan Rochkind wrote:

 We are stuck between two problems, with some people thinking only one of 
 these is/would be a problem, and others not caring at all either way:

 * Local conference/meetup planning chatter overwhelms the listserv when it's 
 on the main listserv
 * People don't find out about local conferences/meetups they are interested 
 in when local chatter is somewhere else.

 My first thought is, gee, this really calls for some kind of threaded forum 
 software, where people can subscribe to only the threads they want. But then 
 I remember, a) that kind of software always sucks, and b) there must be a 
 better web 2.0y way to do it.

 Just as hypothetical brainstorming, what if we did this:

 1.  Local code4lib groups are required (ie, strongly strongly strongly 
 encouraged, we can't really require anyone to do anything) to, if they have a 
 local listserv at all, have it listserv in some place that:
    a)  Has _publically viewable archives_
    b) Has an RSS-or-Atom feed of those archives, which requires no 
 authentication to subscribe to
 [Google groups is one very easy way to get both those things, but certainly 
 not the only one]
 2. All those local listservs are listed on a wiki page, which local groups 
 are required to add their listserv to.
 3. We set up a planet aggregator of all those listserv's RSS.

 4. Profit!  That is, now:  * People can sign up for an individual listserv 
 they want, if they want. * People can view the up-to-date 'archives' of an 
 individual listserv on the web if they want; * people can view the up-to-date 
 'archives' of the _aggregated_ C4L Local communication, via the aggregator.   
  Using one of many free on the web RSS-to-email services, people can sign up 
 for an email subscription for the AGGREGATED C4L Local traffic, getting what 
 some want to get with just one more subscription.

 That last part about the RSS-to-email thing is important for our 
 'requirements', but is the kind of sketchiest.  Potentially better is if we 
 write our OWN RSS-to-email service (maybe that will only allow subscriptions 
 to the C4L Aggregator or one of it's components), which we know will work 
 okay, and which does some clever mail header munging so hitting reply to 
 all on an email you get from the aggregator rss-to-email will send your 
 message to the original listserv, so you really can treat your aggregator 
 subscription just like a listserv if you want.


 Just brainstorming here.

 Jonathan

 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Gabriel 
 Farrell [gsf...@gmail.com]
 Sent: Friday, April 09, 2010 4:47 PM
 To: CODE4LIB@LISTSERV.ND.EDUmailto:CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib North planning continues

 I'm hoping to attend the upcoming code4libnorth meeting because I
 heart Canada, but I'd rather not join yet another mailing list. If it
 gets canceled or something tell us on this list or put it on the wiki
 page, please?


 On Thu, Apr 8, 2010 at 11:46 AM, Walker, David 
 dwal...@calstate.edumailto:dwal...@calstate.edu wrote:
 I'm not on that conference list, so don't really know how much traffic it 
 gets.

 But it seems to me that, since these regional conferences are mostly being 
 held at different times of the year from the main conference, the overlap 
 would be minimal.

 Or not.  I don't know.

 --Dave

 ==
 David Walker
 Library Web Services Manager
 California State University
 http://xerxes.calstate.eduhttp://xerxes.calstate.edu/
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of William 
 Denton [...@pobox.com]
 Sent: Thursday, April 08, 2010 7:45 AM
 To: CODE4LIB@LISTSERV.ND.EDUmailto:CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib North planning continues

 On 8 April 2010, Walker, David quoted:

 I think a good compromise is to have local meeting
 conversations on the code4libcon google group.

 That list is for organizing the main conference, with details about
 getting rooms, food, shuttle buses, hotel booking agents, who can MC
 Thursday afternoon, etc.  Mixing that with organizational details *and*
 general discussion about all local chapter meetings would confuse
 everything, I think.

 Bill
 --
 William

Re: [CODE4LIB] newbie

2010-03-26 Thread Ross Singer

sexy groovy - 43,200

On Thu, Mar 25, 2010 at 10:36 PM, Andrew Hankinson
andrew.hankin...@gmail.com wrote:
 Just out of curiosity I tried them in quotes:

 sexy ruby - 72,200
 sexy python - 37,900
 sexy php - 25,100
 sexy java - 16,100
 sexy asp - 14,800
 sexy perl - 8,080
 sexy C++ - 177
 sexy FORTRAN - 67
 sexy COBOL - 8

 I tried sexy lisp but the results were skewed by speech impediment 
 fetishes. Which I'd say is even less strange than 8 people thinking you can 
 write sexy COBOL.

 On 2010-03-25, at 10:20 PM, Tim Spalding wrote:

 Finally, I never would have put the strings PHP and sexiness in a 
 sentence together (though I guess I just did).

 A simple Google search shows how very wrong you are:

 sexy php - 56,100,000 results
 sexy asp - 8,380,000
 sexy java - 6,360,000
 sexy ruby - 2,840,000
 sexy perl - 532,000
 sexy C++ - 488,000
 sexy smalltalk - 113,000
 sexy fortran - 107,000
 sexy COBOL - 58,100

 There are also very high results for sexy logo. Perhaps, since I was
 in fourth grade, someone's figured out something interesting to do
 with that stupid turtle!

 Tim

Re: [CODE4LIB] PHP bashing (was: newbie)

2010-03-26 Thread Ross Singer

On Fri, Mar 26, 2010 at 10:22 AM, Mike Taylor m...@indexdata.com wrote:
 For someone who is just starting out in programming, I think the very
 last thing you want is a verbose language that makes you spend half
 your time talking about types that you don't really care about.  I'm
 not saying there isn't a time and a place for static type-checking,
 but while learning to program isn't it.

+1

I couldn't agree more.

To all points.

And now that we know your language of choice, we are anxiously
awaiting your MARC-8 support patch to ruby-marc, Mike.

-Ross.

Re: [CODE4LIB] newbie

2010-03-25 Thread Ross Singer

On Thu, Mar 25, 2010 at 12:29 PM, Aaron Rubinstein
arubi...@library.umass.edu wrote:

 This is some of the best advice.  Reading and adapting good code has been my
 favorite way to learn.  There was a discussion a couple years back on a
 code4lib code repository of some kind[1].  I'd love to resurrect this idea.
  A private pastebin[2] might be a decent option.  I also know that a number
 of us use GitHub[3], which allows for collecting syntax highlighted code
 snippets and has some nifty social networking features that let you follow
 other coders and projects.  GitHub is certainly not a solution for a
 code4lib repository but is another way to share code and learn from each
 other.

I disagreed with this back in the day, and I still disagree with
running our own code repository.  There are too many good code hosting
solutions out there for this to be justifiable.  We used to run an SVN
repo at code4lib.org, but we never bothered rebuilding it after our
server got hacked.

Actually I think GitHub/Google Code and their ilk are a much better
solution -- especially for pastebins/gists/etc.  What would be useful,
though, is an aggregation of the Code4lib's community spread across
these sites, sort of what like the Planet does for blog postings, etc.
or what Google Buzz does for the people I follow (i.e. I see their
gists).

I'd buy in to that (and help support it), but I'm not sure how one
would go about it.

-Ross.

Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-23 Thread Ross Singer

On Mon, Mar 22, 2010 at 1:09 PM, Karen Coyle li...@kcoyle.net wrote:

 the records... It might wok, I really want to try to model this. Wish we
 could get some folks together for a 1/2 day somewhere and JUST DO IT.

+1 to this.  Maybe a whole day or two, though.  I totally agree we're
past the point of hand waviness and just need to model this stuff
/pragmatically/ (i.e. in a manner we think we could actually use), at
scale, and have something to point to.

And then release whatever comes out of it so other can do the same
thing.  Honestly, I believe we're at a stage of librarian-exhaustion
over RDA and FRBR that the first decent working example of this,
however removed from the actual specs, will become the defacto
standard.

-Ross.

Re: [CODE4LIB] Any examples of using OAI-ORE for aggregation?

2010-03-19 Thread Ross Singer

Joe, I'm not sure if this conforms to what you're talking about, but
have you seen the Library of Congress' OAI-ORE implementation for
Chronicling America?

http://chroniclingamerica.loc.gov/

http://chroniclingamerica.loc.gov/lccn/sn83030214.rdf

-Ross.

On Wed, Mar 10, 2010 at 1:44 PM, Joe Hourcle
onei...@grace.nascom.nasa.gov wrote:
 Most of the examples I've seen of OAI-ORE seem to assume that you're
 ultimately interested in only one object within the resource map --
 effectively, it's content negotiation.

 Has anyone ever played with using ORE to point at an aggregation, with the
 expectation that the user will be interested in all parts, and automatically
 download them?

 ...

 Let me give a concrete example:

        A user searches for some data ... we find (x) number of records
        that match their criteria, and they then weed the list down to 10
        files of interest.

        We then save this request as a Resource Map, as part of an OAIS
        order.  I then want to be able to hand this off to a browser /
        downloader / whatever to try to obtain the individual files.

 Currently, I have something that can take the request, and create a tarball
 on the fly, but we have the unfortunate situation when some of the data is
 near-line and/or has to be regenerated -- I'm trying to find a good way to
 effectively fork the request into multiple smaller request, some of which I
 can service now, and some for which I can return an HTTP 503 status (service
 unavailable) w/ a retry-after header.

 ...

 Has anyone ever tried doing something like this?  Should I even be looking
 at ORE, or is there something that better fits with what I'm trying to do?

 Thanks for any advice / insight you can give

 -Joe

 -
 Joe Hourcle
 Programmer/Analyst
 Solar Data Analysis Center

Re: [CODE4LIB] Vote for Code4Lib 2011 host is OPEN

2010-03-12 Thread Ross Singer

Polls close midnight EDT March 23.

May the best city win,
-Ross.

On Fri, Mar 12, 2010 at 5:37 PM, Michael J. Giarlo
leftw...@alumni.rutgers.edu wrote:
 Folks,

 We received three excellent proposals for hosting the 2011 conference,
 and now it is time to vote on them!  Voting is open for a week.
 (Actually, I don't know the close date/time but we should have a week
 or so to vote.  Ross?)

 How to vote:

 1. Go here: http://vote.code4lib.org/election/index/15

 2. Log in using your code4lib.org credentials (register at
 code4lib.org if you haven't done so already).  If you have trouble
 authenticating, contact myself and Ryan Wick (ryanwick at gmail).

 3. Click on a host's name to reveal a link to the full proposal

 4. Assign each proposal a rank from 0 to 3, 0 being least desirable
 and 3 being the most.  Please keep the conference requirements and
 desirables in mind as you make your selection:
 http://code4lib.org/conference/hosting

 5. Once you are satisfied with your rankings, click Cast your ballot.

 6. Want to change your rankings?  You can!  As often as you'd like,
 even, up until the vote closes.

 Feel free to watch http://vote.code4lib.org/election/results/15 for
 returns, or hop into irc://irc.freenode.net/code4lib and type
 @hosts2011.

 Thanks to Ross Works Hard For The Money Singer for setting the vote
 up, as always!

 -Mike

Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Ross Singer

On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote:

 I certainly would be will to work with LC on creating a MARC-JSON 
 specification as I did in creating the MARC-XML specification.

Quite frankly, I think I (and I imagine others) would much rather see
a more open, RFC-style process to creating a marc-json spec than I
talked to LC and here you go.

Maybe I'm misreading this last paragraph a bit, however.

-Ross.

Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Ross Singer

On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Young byo...@bigbluehat.com wrote:

 A CouchDB friend of mine just pointed me to the BibJSON format by the
 Bibliographic Knowledge Network:
 http://www.bibkn.org/bibjson/index.html

 Might be worth looking through for future collaboration/transformation
 options.

marc-json and BibJSON serve two different purposes:  marc-json would
need to be a loss-less serialization of a MARC record which may or may
not contain bibliographic data (it may be an authority, holding or CID
record, for example).  BibJSON is more of a merging of data model and
serialization (which, admittedly, is no stranger to MARC) for the
purpose of bibliographic /citations/.  So it will probably be lossy
and there would most likely be a lot of MARC data that is out of
scope.

That's not to say it wouldn't be useful to figure out how to get from
MARC-BibJSON, but from my perspective it's difficult to see the
advantage it brings (being tied to JSON) vs. BIBO.

-Ross.

Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-03 Thread Ross Singer

The date is not etched in stone.

-Ross.

On Wed, Mar 3, 2010 at 9:35 AM, Ethan Gruber ewg4x...@gmail.com wrote:
 Ithaca in February sounds kind of depressing, honestly.


 On Wed, Mar 3, 2010 at 9:27 AM, Ma, Hong h...@miami.edu wrote:

 Agree with Carol. Austin is good.

 Thanks,

 Hong

 Hong Ma
 Information Systems Librarian
 Otto G. Richter Library
 University of Miami
 1300 Memorial Dr., Rm.301-A
 Coral Gables, FL 33124
 h...@miami.edu
 (305) 284-8844

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Carol Bean
 Sent: Wednesday, March 03, 2010 9:06 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 Snowy northern climes--

 Carol
 (still hoping for a bid from Austin)



 From:
 Kevin S. Clarke kscla...@gmail.com
 To:
 CODE4LIB@LISTSERV.ND.EDU
 Date:
 03/03/2010 09:00 AM
 Subject:
 Re: [CODE4LIB] Code4Lib 2011 Proposals
 Sent by:
 Code for Libraries CODE4LIB@LISTSERV.ND.EDU



 On Wed, Mar 3, 2010 at 6:35 AM, John Fereira ja...@cornell.edu wrote:

  I've got a bit of conference planning burnout after being on the
 planning
  commitee for the Jasig conference for the sixth time in a row but I'm
  inclined to throw out Ithaca, NY as a possible location for 2011.

 ooh, +1 ... I was born in Ithaca, but haven't been back since; I'd
 love an excuse to visit and explore! From what I hear, it would make a
 nice venue for c4l11.

 Kevin

Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-03 Thread Ross Singer

On Wed, Mar 3, 2010 at 9:55 AM, Paul Joseph pjjos...@gmail.com wrote:
 No need to be concerned about the vendors: they're the same suspects who
 sponsored C4L10.

Just to be clear on this -- the same suspects actually shelled out far
less for C4L10 than they had in the past.

And we had far fewer sponsors than we had in, say, Portland (which
required similar economic gymnastics, in a much stronger economy, to
keep it affordable).

-Ross.

Re: [CODE4LIB] Asheville Brews Cruise details payment info

2010-02-16 Thread Ross Singer

HOW MANY PIZZAS CAN I PUT YOU DOWN FOR?

Love,
-Talis.

 Hi all,

 It is time to reveal the details about the Brews Cruise social activity
 planned for next Tuesday night at the Code4Lib 2010 conference [1]. Let's
 keep list noise to a minimum, so folks who have questions about the details,
 please e-mail me directly or, if it's discussion worthy, stick to the
 code4libcon list.

 First off, the event is full. Sorry if you missed the cut. We were forced to
 set a limit of 48 persons due to that's the max number of folks that will
 fit into two party buses, plus we don't want to overwhelm the staffs at the
 breweries. There is, however, is a waitlist that someone started on the
 sign-up page [2].

 Secondly, I want to thank Talis [3] for stepping up and sponsoring a portion
 of this event. Our first stop on the cruise will be a brewery slash pizza
 joint and Talis has generously offered to pay for our pizza. Yay!

 Cost  Payment options:
 The cost for the cruise is $40 per person. You have two options for paying:
  1) Pay in advance by sending me $40 via PayPal.
  2) Bring $40 with you on the night of the cruise. I've been told they have
 a hand-held credit card machine for the cash-strapped.

 Anyone who wants to can pay via PayPal, but I need at least 16 people to
 choose this option because the tour company wants to pre-bill my credit card
 for a minimum of 16 guests. There should be no fees involved if the money
 comes from your PayPal account or an associated bank account. The deadline
 for paying in advance is EOD Sunday, February 21st.

 If you wish to prepay via Paypal--you know you want to--here are the
 instructions:
  1) Go to http://paypal.com
  2) Click on Send Money
  3) Enter lb...@reallywow.com in the To field
  4) Enter your own address in the From field (unless you're logged in)
  5) Click the Personal tab and choose Payment owed from the options
  6) Click Continue
  7) On the next page you can specify a message Subject of Brews Cruise

 Itinerary:
  - Pickup from the hotel is tentatively scheduled for 6:15pm. Those who
 haven't pre-paid should try to get there a little early.
  - Stop #1 will be the Asheville Pizza  Brewing Co. where we will sample
 16-20 different beers and consume our delicious, alcohol-absorbing,
 Talis-sponsored pizza.
  - Stop #2 will be Highland Brewing Company, Asheville's 1st and largest
 brewing company
  - Stop #3 will be the French Broad Brewery which specializes in a variety
 of European style beers.
  - Expected return to the hotel is around 9:30-10pm

 Thanks for signing up! I think it's going to be a great time!

 --jay

 PS, did I mention Talis is paying for the pizza! Yay, Talis!
 PPS, Talis employee, Ross Singer, will be attending the event. Be sure to
 ask him about Platforms.


 [1]
 http://wiki.code4lib.org/index.php/C4L2010_social_activities#Asheville_Brews_Cruise
 [2] http://wiki.code4lib.org/index.php/C4L2010_social_activities#Wait_List
 [3] http://www.talis.com/

Re: [CODE4LIB] Rails Hosting

2010-01-14 Thread Ross Singer

Have you looked at Heroku (http://heroku.com/)?  I've only used their
freebie plan (so I have no idea how they compare pricewise), but it's
been fantastic to get Ruby apps running there.

Dreamhost also provides Passenger to their customers
(http://wiki.dreamhost.com/Passenger) so that might be an option, too.

-Ross.

On Thu, Jan 14, 2010 at 11:15 AM, Kevin Reiss reiss.ke...@yahoo.com wrote:
 Hi,

 I was curious if anyone could recommend a hosting service that they've had a 
 good ruby on rails experience with. I've been working with bluehost but my 
 experience has not been good. You need to work through a lot of hoops just to 
 get a moderately complicated rails application properly. The applications we 
 are looking at deploying would be moderately active, 1,000 -2000 visits a 
 day. Thanks for any comments in advance.

 Regards,

 Kevin Reiss

Re: [CODE4LIB] Rails Hosting

2010-01-14 Thread Ross Singer

I think one thing to consider between Heroku and something like
Slicehost, is what exactly you have the resources/willingness to
support.

One of the things I've really liked is that to get an app running on
Heroku is that I basically just have to worry about my Ruby app, not
maintaining a server environment.

On the other hand, it's somewhat limiting as to what I can do there,
so it's not a solution to every problem.

-Ross.

On Thu, Jan 14, 2010 at 11:44 AM, Rosalyn Metz rosalynm...@gmail.com wrote:
 Hi Kevin,

 I'm going to recommend slicehost also.  Again, I haven't used it but I met
 the (former) owner.  He sold the business to rackspace, which has an awesome
 reputation in the cloud computing world.  They are #2 behind amazon.

 Rosalyn

 On Thu, Jan 14, 2010 at 11:40 AM, Doran, Michael D do...@uta.edu wrote:

 Hi Kevin,

 Although I can't recommend any hosting based on personal experience, a
 while back I had bookmarked a recommended (by another code4libber) hosting
 site: Slicehost at http://www.slicehost.com/

 I think they pretty much get out of the way and let you do what you want,
 development wise.  Regarding Rails in particular, one of their testimonials
 said The only thing I can say is Wow! ... Rails up and running in 30
 minutes.  Another said ...I’m a Rails developer and a Linux enthusiast who
 can’t believe he found a Gentoo VPS with 256MB RAM for $20/month.  And yet
 another ...I’m a freelance Rails developer, and my experience on an Ubuntu
 VPS has been fantastic compared to my previous shared hosting experience.
 [1]

 Again, this is *not* a recommendation from personal experience.

 -- Michael

 [1] http://www.slicehost.com/why-slicehost/testimonials

 # Michael Doran, Systems Librarian
 # University of Texas at Arlington
 # 817-272-5326 office
 # 817-688-1926 mobile
 # do...@uta.edu
 # http://rocky.uta.edu/doran/


  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
  Kevin Reiss
  Sent: Thursday, January 14, 2010 10:16 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Rails Hosting
 
  Hi,
 
  I was curious if anyone could recommend a hosting service that they've
  had a good ruby on rails experience with. I've been working with
  bluehost but my experience has not been good. You need to work through
  a lot of hoops just to get a moderately complicated rails application
  properly. The applications we are looking at deploying would be
  moderately active, 1,000 -2000 visits a day. Thanks for any comments in
  advance.
 
  Regards,
 
  Kevin Reiss

Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?

2010-01-06 Thread Ross Singer

I definitely agree with Bill here.  There is a definitely a totemistic
attitude about vim or emacs being all the IDE I need.  Knowing your
way around vim (or possibly emacs) is certainly important -- after
all, everybody has to eventually fix something remotely -- but just
languages, some editors look or feel better.

My basic credo is that I want to find the absolute least resistance
between what I see in my head and what eventually gets run.  This
applies to my office chair, my keyboard, my monitor, operating system,
editor, language, SCM, deployment manager, etc.  Every layer provides
amperage that must be accounted for.

Because in the end, I'm spending 8+ hours a day, 5+ days a week
looking at and working on this setup; I might as well be comfortable.

Personally, I use TextMate for practically all the code I write.  I
don't actually use all the features that generally draw people to
TextMate (SCM integration, macros for automating certain tasks in
particular languages/frameworks, etc.) at all.  I just like the way it
looks, it's relatively lean and I can easily cut and paste (which is
my major knock on the character-based editors).  I have a mouse,
dammit, so let me use it.

I also use NetBeans, sometimes, although, honestly, it's only when I
need to run SQL queries against JDBC databases anymore.

If I was a real developer (meaning I wrote code intended to be
compiled, etc.), I couldn't imagine not using something like NetBeans
or Eclipse to automate some of the tedium.

-Ross.

On Wed, Jan 6, 2010 at 9:23 AM, Bill Dueber b...@dueber.com wrote:
 On Wed, Jan 6, 2010 at 8:53 AM, Joel Marchesoni jma...@email.wcu.eduwrote:

 I agree with Dan's last point about avoiding using a special IDE to develop
 with a language.


 I'll respectfully, but vehemently, disagree. I would say avoid *forcing*
 everyone working on the project depend on a special IDE -- avoid lockin.
 Don't avoid use.

 There's a spectrum of how much an editor/environment can know about a
 program. At one end is Smalltalk, where the development environment *is* the
 program. At the other end is something like LISP (and, to an extent, Ruby)
 where so little can be inferred from the syntax of the code that a smart
 IDE can't actually know much other than how to match parentheses.

 For languages where little can be known at compile time, an IDE may not buy
 you very much other than syntax highlighting and code folding. For Java,
 C++, etc. an IDE can know damn near everything about your project and
 radically up your productivity -- variable renaming, refactoring,
 context-sensitive help, jump-to-definition, method-name completion, etc. It
 really is a difference that makes a difference.

 I know folks say they can get the same thing from vim or emacs, but at that
 level those editors are no less complex (and a good deal more opaque) than
 something like Eclipse or Netbeans unless you already have a decade of
 experience with them.

 If you're starting in a new language, try a couple editors, too. Both
 Eclipse and Netbeans are free and cross-platform, and have support for a lot
 of languages. Editors like Notepad++, EditPlus, Textmate jEdit, and BBEdit
 can all do very nice things with a variety of languages.



 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library

Re: [CODE4LIB] Online PHP course?

2010-01-06 Thread Ross Singer

Seems to me that Dan's Hacker 101/201 preconfs fall into this sort
of category.

I think it would be really useful to see at a conference that didn't
already appeal to the hacker set, like CiL or LITA or something.

Even Access.

-Ross.

On Wed, Jan 6, 2010 at 2:20 PM, Tim Spalding t...@librarything.com wrote:
 I wonder if Code4Lib would ever be a good outlet for online
 programming tutorials or hack sessions. I mean, get 10 people on
 Etherpad or CodeArmy together, and Skype, and you could learn a lot,
 and do a lot.

 Tim

Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?

2010-01-05 Thread Ross Singer

I realize you didn't want to start a religious war nor were you
interested in the abstract reasons people chose a particular language,
that being said...

I honestly think choosing the best* development language is very
similar to how one settles on politics, religion, diet, etc.
Environment plays a part, of course, but, in the end, what generally
works best is the language that jibes best with you and your
personality.  Since you've dabbled with several different languages,
you've had to have come across this - some languages just feel
better than others.  This is, however, an entirely personal choice.

Dan Chudnov, for example, seems to think in Python.  When I tried
Python, it never really clicked -- I muddled through a few projects
but never really got it.  I then got introduced to Ruby, everything
made sense, and I never looked back.  I recently did a project in
Groovy/Grails and my takeaway was that it was a scripting language
that only somebody that had spent their career as a Java developer
could love.  My coworker (who has spent his career as a Java
developer) LOVES Groovy.  He thinks Ruby is a Fisher-Price language.
To each their own.

Since you don't seem to have institutional constraints on what you can
develop in, I would recommend you try something like this:

Take a handful of languages that look interesting to you and try
writing a simple app to take some of your data, model it and shove it
into Solr and make an interface to look at it.  Solr's pretty perfect
for this sort of project:  it's super simple to work with and
immediately gives you something powerful and versatile to wrap your
app around.  If you can't make something useful quickly around Solr,
then move on to the next language because that one's not for you.

If the ones that click happen to be PHP, Python or Ruby, well, there
you go.  If not, I, for one, look forward to your new Lua (or
whatever) based discovery interface.

Ultimately, any project you choose for your discovery interface is
going to require a lot of customization to make it work the way you
want -- the key is finding the environment that stands the least in
the way between turning what's in your head into a working app.

Good luck,
-Ross.

On Tue, Jan 5, 2010 at 6:04 PM, marijane white marijane.wh...@gmail.com wrote:
 Greetings Code4Lib,

 Long time lurker, first time poster here.

 I've been turning over this question in my mind for a few weeks now, and Joe
 Hourcle's postscript in the Online PHP Course thread has prompted me to
 finally try to ask it. =)

 I'm interested in hearing how the members of this list have gone about
 choosing development platforms for their library coding projects and/or
 existing open source projects (ie like VuFind vs Blacklight).  For example,
 did you choose a language you already were familiar with?  One you wanted to
 learn more about?  Does your workplace have a standard enterprise
 architecture/platform that you are required to use?  If you have chosen to
 implement an existing open source project, did you choose based on the
 development platform or project maturity and features or something else?

 Some background -- thanks to my undergraduate computer engineering studies,
 I have a pretty solid understanding of programming fundamentals, but most of
 my pre-LIS work experience was in software testing and did not require me to
 employ much of what I learned programming-wise, so I've mostly dabbled over
 the last decade or so.  I've got a bit of experience with a bunch of
 languages and I'm not married to any of them.   I also kind of like having
 excuses to learn new ones.

 My situation is this: I would like to eventually implement a discovery tool
 at MPOW, but I am having a hell of a time choosing one.  I'm a solo
 librarian on a content team at a software and information services company,
 so I'm not really tied to the platforms used by the software engineering
 teams here.  I know a bit of Ruby, so I've played with Blacklight some, got
 it to install on Windows and managed to import a really rough Solr index.
 I'm more attracted to the features in VuFind, but I don't know much PHP yet
 and I haven't gotten it installed successfully yet.  My collection's
 metadata is not in an ILS (yet) and not in MARC, so I've also considered
 trying out more generic approaches like ajax-solr (though I don't know a lot
 of javascript yet, either).  I've also given a cursory look at SOPAC and
 Scriblio.  My options are wide open, and I'm having a rough time deciding
 what direction to go in.  I guess it's kind of similar to someone who is new
 to programming and attempting to choose their first language to learn.

 I will attempt to head off a programming language religious war =) by
 stating that I'm not really interested in the virtues of one platform over
 another, moreso the abstract reasons one might have for selecting one.
 Have any of you ever been in a similar situation?  How'd you get yourself
 unstuck?  If you haven't, what do

Re: [CODE4LIB] resource sharing/ill oss

2010-01-04 Thread Ross Singer

Has anybody followed up on Relais announcement that their products
will be open sourced:

http://www.relais-intl.com/relais/home/Relais%20Products%20Go%20Open%20Source%20-%20Press%20Release.pdf

?

OpenILL (which was written by the University of Winnipeg in Cold
Fusion) also seems to have disappeared.

-Ross.

On Mon, Jan 4, 2010 at 12:29 PM, Eric Lease Morgan
eric_mor...@infomotions.com wrote:
 Do you know of any resource sharing/ILL open source software?

 Prospero seems like a likely candidate, but it also seems to have gone 
 missing.

 [1] http://bones.med.ohio-state.edu/prospero/

 --
 Eric Lease Morgan

Re: [CODE4LIB] T-shirt Design Contest

2010-01-04 Thread Ross Singer

I've asked it before, I'll ask it again.

Can we add the Roy Thong(tm)?

-Ross.

On Mon, Jan 4, 2010 at 1:02 PM, Smith,Devon smit...@oclc.org wrote:
 There is a cafepress store front for code4lib. There's nothing in it at the 
 moment. http://www.cafepress.com/code4lib

 Last year I suggested that all tshirts be put in the store. Then I forgot all 
 about it. Oops, my bad.

 Tentative guidelines:
 - The contest and the store are separate. You can enter the contest and not 
 have your shirt in the store.
 - Shirts will be sold at cost for now.
 - To get your design in the store, send it to code4libcafepr...@decasm.com. 
 By sending it to this address, you agree to the following terms: Designs 
 submitted to the code4lib cafepress store will be sold for amounts to be 
 determined by the code4libcon mailing list. Any revenue generated beyond 
 costs will be spent according to voting on that list. You retain copyright, 
 but grant permission to the code4lib community to sell any product available 
 on cafepress with the submitted design. These guidelines are tentative and 
 subject to change. You agree to be cool about that.


 For best results, tshirts for the cafepress store should be designed with 
 this template:  http://www.cafepress.com/content/si/temp_10x10_apparel.zip


 /dev


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Rosalyn Metz
 Sent: Monday, January 04, 2010 7:23 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] T-shirt Design Contest

 correct me if i'm wrong, but didn't someone already set up a store.  i
 distinctly remember there being a roy tennant thong (as do others if
 you google it).  it appears to have gone away though...




 On Sun, Jan 3, 2010 at 5:51 PM, Kevin S. Clarke kscla...@gmail.com wrote:
 I like that idea (and the idea of it as something that exists apart
 from the conference budget, but perhaps funds scholarships in the
 following year).  I think last year someone suggested putting all the
 t-shirt submissions in there (not just the winning one - It lets folks
 buy the conference shirt, but also others that might appeal to them).
 I think anyone in the community could register
 http://shop.cafepress.com/lisforge and manage it as a means to fund
 scholarships (or contribute in some other way - depending on the
 amount raised).

 Kevin



 On Sun, Jan 3, 2010 at 5:32 PM, Michael J. Giarlo
 leftw...@alumni.rutgers.edu wrote:
 We've talked before about setting up a code4lib CafePress store.
 Maybe we've already done it?   It's an idea, at least.

 -Mike



 On Sun, Jan 3, 2010 at 16:40, Christina Salazar
 christinagama...@gmail.com wrote:
 Y'know... I think y'all should order extras and sell and ship them to
 those of us who cannot attend. I love my past conference t-shirts and
 they get some interesting reactions when I wear 'em. I'd buy any one
 of these designs.

 Seems like you might be able to make a bit of dough for scholarships
 and whatnot...

 Christina Salazar

 On Sun, Jan 3, 2010 at 12:23 PM, Patrick Hochstenbach
 patrick.hochstenb...@ugent.be wrote:
 Hello All,

 Here is the Inkscape entry designed by my lovely wife :)

 Greetings from Belgium,
 P@

 Skype: patrick.hochstenbach
 Patrick Hochstenbach   Software Architect
 University Library         +32(0)92647980
 Ghent University * Rozier 9 * 9000 * Gent

Re: [CODE4LIB] T-shirt Design Contest

2010-01-04 Thread Ross Singer

I note they also have a boxer short option -- so, I'll up the ante
to an entire line of Roy Tennant Undergarments

-Ross.

On Mon, Jan 4, 2010 at 10:31 PM, Ross Singer rossfsin...@gmail.com wrote:
 I've asked it before, I'll ask it again.

 Can we add the Roy Thong(tm)?

 -Ross.

 On Mon, Jan 4, 2010 at 1:02 PM, Smith,Devon smit...@oclc.org wrote:
 There is a cafepress store front for code4lib. There's nothing in it at the 
 moment. http://www.cafepress.com/code4lib

 Last year I suggested that all tshirts be put in the store. Then I forgot 
 all about it. Oops, my bad.

 Tentative guidelines:
 - The contest and the store are separate. You can enter the contest and not 
 have your shirt in the store.
 - Shirts will be sold at cost for now.
 - To get your design in the store, send it to code4libcafepr...@decasm.com. 
 By sending it to this address, you agree to the following terms: Designs 
 submitted to the code4lib cafepress store will be sold for amounts to be 
 determined by the code4libcon mailing list. Any revenue generated beyond 
 costs will be spent according to voting on that list. You retain copyright, 
 but grant permission to the code4lib community to sell any product available 
 on cafepress with the submitted design. These guidelines are tentative and 
 subject to change. You agree to be cool about that.


 For best results, tshirts for the cafepress store should be designed with 
 this template:  http://www.cafepress.com/content/si/temp_10x10_apparel.zip


 /dev


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Rosalyn Metz
 Sent: Monday, January 04, 2010 7:23 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] T-shirt Design Contest

 correct me if i'm wrong, but didn't someone already set up a store.  i
 distinctly remember there being a roy tennant thong (as do others if
 you google it).  it appears to have gone away though...




 On Sun, Jan 3, 2010 at 5:51 PM, Kevin S. Clarke kscla...@gmail.com wrote:
 I like that idea (and the idea of it as something that exists apart
 from the conference budget, but perhaps funds scholarships in the
 following year).  I think last year someone suggested putting all the
 t-shirt submissions in there (not just the winning one - It lets folks
 buy the conference shirt, but also others that might appeal to them).
 I think anyone in the community could register
 http://shop.cafepress.com/lisforge and manage it as a means to fund
 scholarships (or contribute in some other way - depending on the
 amount raised).

 Kevin



 On Sun, Jan 3, 2010 at 5:32 PM, Michael J. Giarlo
 leftw...@alumni.rutgers.edu wrote:
 We've talked before about setting up a code4lib CafePress store.
 Maybe we've already done it?   It's an idea, at least.

 -Mike



 On Sun, Jan 3, 2010 at 16:40, Christina Salazar
 christinagama...@gmail.com wrote:
 Y'know... I think y'all should order extras and sell and ship them to
 those of us who cannot attend. I love my past conference t-shirts and
 they get some interesting reactions when I wear 'em. I'd buy any one
 of these designs.

 Seems like you might be able to make a bit of dough for scholarships
 and whatnot...

 Christina Salazar

 On Sun, Jan 3, 2010 at 12:23 PM, Patrick Hochstenbach
 patrick.hochstenb...@ugent.be wrote:
 Hello All,

 Here is the Inkscape entry designed by my lovely wife :)

 Greetings from Belgium,
 P@

 Skype: patrick.hochstenbach
 Patrick Hochstenbach   Software Architect
 University Library         +32(0)92647980
 Ghent University * Rozier 9 * 9000 * Gent

Re: [CODE4LIB] SVN/Mercurial hosting

2009-12-16 Thread Ross Singer

Also, Google Code offers both HG and SVN support.

http://code.google.com/projecthosting/

I have several projects there (although haven't used Mercurial) and
certainly find it a lot less frustrating than admin'ing Trac.

-Ross.

On Wed, Dec 16, 2009 at 2:39 PM, Mark A. Matienzo m...@matienzo.org wrote:
 Hi Yitzchak,

 I've been pretty happy with using BitBucket [1] to host Mercurial
 repositories. It doesn't have Trac, but it does have it's own decently
 featured issue tracker, commit log viewer, and wiki system. The free
 plan is generous enough for you to get started.

 [1] http://bitbucket.org/

 Mark A. Matienzo
 Applications Developer, Strategic Planning
 The New York Public Library



 On Wed, Dec 16, 2009 at 2:22 PM, Yitzchak Schaffer
 yitzchak.schaf...@gmx.com wrote:
 Hello all,

 As I was considering whether to migrate our SVN repositories to Mercurial
 (or possibly Bazaar) so as to allow for distributed control (like if I'm on
 the train or otherwise off the grid), I got word from our IT higher-ups that
 they want us to stop hosting our code on our domain and server.

 Before I start trekking around looking for hosting, does anyone in the crowd
 here have a server set up, and is potentially willing to host Trac+SVN or
 Trac+HG for our open-source projects?  We currently have two.

 Alternately, I'd love to hear suggestions on regular hosting providers -
 particularly for Trac+Mercurial.

 Many thanks,

 --
 Yitzchak Schaffer
 Systems Manager
 Touro College Libraries
 33 West 23rd Street
 New York, NY  10010
 Tel (212) 463-0400 x5230
 Fax (212) 627-3197
 Email yitzchak.schaf...@gmx.com

 Access Problems? Contact systems.libr...@touro.edu

Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ross Singer

I suppose it would be helpful to actually know the problem that is
trying to be solved here (I mean, a lot of people, including myself,
are throwing out solutions to a problem that's never been actually
defined).

Ethan, what, exactly, are you trying to do?  Do you want authorized
headings?  Or do you want LCSH that appears in the wild?

-Ross.

On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote:
 On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote:
 Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy
 of the LC subject authority file. The entries in this file form the basis
 for subject headings, most of which add facets to the authority entry when
 forming the subject heading. One could do a left-anchored match against
 actual headings, and that might provide some interesting statistics.

 Yes, using the actual headings extracted from bibliographic data seems
 to be a better approach. It's easier to rank them, and as Karen points
 out you get the actual post-coordinated headings, not just the
 headings LC has decided to establish authority records for.

 //Ed

Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-07 Thread Ross Singer

It has an OpenSearch interface:

http://id.loc.gov/authorities/opensearch

But I don't think there's a way to explicitly limit to, say, the
beginning of a label.

lcsubjects.org has a sparql interface where you could use a regex
filter on the labels:

http://api.talis.com/stores/lcsh-info/services/sparql

But it's probably not going to be fast enough for what you're talking about.

There are around 647,500 distinct labels in the dump files, it might
be easier and better to just grab the n-triples file, pull the lines
with http://www.w3.org/2004/02/skos/core#prefLabel and
http://www.w3.org/2004/02/skos/core#altLabel and shove them in a
local data store.

On the other hand, if you don't care where your autocomplete string
is coming from in the label, you could try:

http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Av+||+altlabel%3Avmax=10offset=0sort=xsl=content-type=

http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Avi+||+altlabel%3Avimax=10offset=0sort=xsl=content-type=

http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Avir+||+altlabel%3Avirmax=10offset=0sort=xsl=content-type=

etc.

-Ross.

On Mon, Dec 7, 2009 at 10:46 AM, Ethan Gruber ewg4x...@gmail.com wrote:
 Hi all,

 I have a need to integrate the LCSH terms into a web form that uses
 auto-suggest to control the vocabulary.  Is this technically possible with
 the id.loc.gov service?  I can curl a specific id to view the rdf, but I
 would need to know the specifics of the search index on the site to feed the
 auto-suggest.  For example, when the user types va in the box, the results
 should filter all subject headings that begin with va.

 I can certainly accomplish this by indexing the ~400 meg XML file into solr
 and use TermsComponent to filter terms dynamically as the user types, but
 I'd rather use the LOC's service if possible.

 So my question is: has anyone successfully done this before in the way I
 described?

 Thanks,
 Ethan Gruber
 University of Virginia Library

Re: [CODE4LIB] calling another webpage within CGI script - solved!

2009-11-24 Thread Ross Singer

On Tue, Nov 24, 2009 at 11:18 AM, Graham Stewart
graham.stew...@utoronto.ca wrote:
 We run many Library / web / database applications on RedHat servers with
 SELinux enabled.  Sometimes it takes a bit of investigation and  horsing
 around but I haven't yet found a situation where it had to be disabled.
  setsebool and chcon can solve most problems and SELinux is an excellent
 enhancement to standard filesystem and ACL security.

Agreed that SELinux is useful but it is a tee-otal pain in the keister
if you're ignorantly working against it because you didn't actually
know it was there.

It's sort of the perfect embodiment between the disconnect between the
developer and the sysadmin.  And, if this sort of tension interests
you, vote for Bess Sadler's presentation at Code4lib 2010: Vampires
vs. Werewolves: Ending the War Between Developers and Sysadmins with
Puppet and anything else that interests you.

http://vote.code4lib.org/election/index/13

-Ross Bringin' it on home Singer.

Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ross Singer

On Mon, Nov 23, 2009 at 1:07 PM, Eric Hellman e...@hellman.net wrote:

 Does this answer your question, Ross?

Yes, sort of.  My question was not so much if you can resolve handles
via bindings other than HTTP (since that's one of the selling points
of handles) as it was do people actually use this in the real world?

Of course, it may be impossible to answer that question since, by your
example, such people may not actually be letting anybody know that
they're doing that (although you would probably be somebody with
insider knowledge on this topic).

Also, with your use cases, would these services be impossible if the
only binding was HTTP?  Presumably dx.hellman.net would need to
harvest its metadata from somewhere, which seems like it would leave a
footprint.  It also needs some mechanism to stay in sync with the
master index.  Your non-resolution service also seems to be looking
these things up in realtime.  Would a RESTful or SOAP API (*shudder*)
not accomplish the same goal?

Really, though, the binding argument here is less the issue here than
if you believe http URIs are valid identifiers or not since there's no
reason a URI couldn't be dereferenced via other bindings, either.

-Ross.

Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ross Singer

On Mon, Nov 23, 2009 at 2:52 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

Well, here's the trick about handles, as I understand it. A handle, for
instance, a DOI, is 10.1074/jbc.M004545200.

Well, actually, it could be:
10.1074/jbc.M004545200
doi:10.1074/jbc.M004545200
info:doi/10.1074/jbc.M004545200

etc. But there's still got to be some mechanism to get from there to:
http://dx.doi.org/10.1074/jbc.M004545200
or
http://dx.hellman.net/10.1074/jbc.M004545200

I don't see why it's any different, fundamentally, than:
http://purl.hellman.net/?purl=http%3A%2F%2Fpurl.org%2FNET%2Fdoi%2F10.1074%2Fjbc.M004545200

besides being prettier.

Anyway, my argument wasn't that Purl was technologically more sound
that handles -- Purl services have a major single-point-of-failure
problem -- it's just that I don't buy the argument that handles are
somehow superior because they aren't limited to HTTP.

What I'm saying is that there plenty of valid reasons to value handles
more than purls (or any other indirection service), but independence
to HTTP isn't one of them.

-Ross.

While, for DOI handles, normally we resolve that using dx.doi.org, at
http://dx.doi.org/10.1074/jbc.M004545200, that is not actually a requirement
of the handle system. You can resolve it through any handle server, over
HTTP or otherwise. Even if it's still over HTTP, it doesn't have to be at
dx.doi.org, it can be via any handle resolver.

For instance, check this out, it works:

http://hdl.handle.net/10.1074/jbc.M004545200

Cause the DOI is really just a subset of Handles, any resolver participating
in the handle network can resolve em. In Eric's hypothetical use case, that
could be a local enterprise handle resolver of some kind. (Although I'm not
totally sure that would keep your usage data private; the documentation I've
seen compares the handle network to DNS, it's a distributed system, I'm not
sure in what cases handle resolution requests are sent 'upstream' by the
handle resolver, and if actual individual lookups are revealed by that or
not. But in any case, when Ross suggests -- Presumably dx.hellman.net would
need to harvest its metadata from somewhere, which seems like it would leave
a footprint. It also needs some mechanism to stay in sync with the master
index. -- my reading this suggests this is _built into_ the handle
protocol, it's part of handle from the very start (again, the DNS analogy,
with the emphasis on the distributed resolution aspect), you don't need to
invent it yourself. The details of exactly how it works, I don't know enough
to say. )

Now, I'm somewhat new to this stuff too, I don't completely understand how
it works. Apparently hdl.handle.net can strikehandle/strike deal with
any handle globally, while presumably dx.doi.org can only deal with the
subset of handles that are also DOIs. And apparently you can have a handle
resolver that works over something other than HTTP too. (Although Ross
argues, why would you want to? And I'm inclined to agree).

But appears that the handle system is quite a bit more fleshed out than a
simple purl server, it's a distributed protocol-independent network. The
protocol-independent part may or may not be useful, but it certainly seems
like it could be, it doens't hurt to provide for it in advance. The
distributed part seems pretty cool to me.

So if it's no harder to set up, maintain, and use a handle server than a
Purl server (this is a big 'if', I'm not sure if that's the case), and
handle can do everything purl can do and quite a bit more (I'm pretty sure
that is the case)... why NOT use handle instead of purl? It seems like
handle is a more fleshed out, robust, full-featured thing than purl.

Jonathan

Presumably dx.hellman.net would need to
harvest its metadata from somewhere, which seems like it would leave a
footprint. It also needs some mechanism to stay in sync with the
master index. Your non-resolution service also seems to be looking
these things up in realtime. Would a RESTful or SOAP API (*shudder*)
not accomplish the same goal?

Really, though, the binding argument here is less the issue here than
if you believe http URIs are valid identifiers or not since there's no
reason a URI couldn't be dereferenced via other bindings, either.

-Ross.

Re: [CODE4LIB] Assigning DOI for local content

2009-11-20 Thread Ross Singer

On Fri, Nov 20, 2009 at 2:23 PM, Eric Hellman e...@hellman.net wrote:
 Having incorporated the handle client software into my own stuff rather 
 easily, I'm pretty sure that's not true.

Fair enough.  The technology is binding independent.

So you are using and sharing handles using some protocol other than HTTP?

I'm more interested in the sharing part of that question.  What is the
format of the handle identifier in this context?  What advantage does
it bring over HTTP?

-Ross.

Re: [CODE4LIB] Assigning DOI for local content

2009-11-19 Thread Ross Singer

Back in 2007, I had a different job, different email address and lived
in a different state.  Things change.  If people are sending emails to
ross.sin...@gatech.edu to fix the library web services, they are going
to be sorely disappointed and should perhaps check
http://www.library.gatech.edu/about/staff.php for updates.

purl.org has been going through a massive architecture change for the
better part of a year now -- which has finally been completed.  It was
a slightly messy transition but they migrated from their homegrown
system to one designed by Zepheira.

I feel like predicting the demise of HTTP and worrying about a
services' ability to handle other protocols is unnecessary hand
wringing.

I still have a telephone (two, in fact).  Both my cell phone and VOIP
home phone are still able to communicate flawlessly with a POTS dial
phone.

My car still has an internal combustion engine based on petroleum.  It
still doesn't fly or even hover.  My wall outlets still accept a plug
made in the 1960s.

PURLs themselves are perfectly compatible with protocols other than HTTP:
http://purl.org/NET/rossfsinger/ftpexample

The caveat being that the initial access point is provided via HTTP.

But then again, so is http://hdl.handle.net/, which, in fact, the only
way currently in practice to dereference handles.

My point is, there's a lot of energy, resources and capital invested
in HTTP.  Even if it becomes completely obsolete, my guess I can still
type http://purl.org/dc/terms; in spdy://google.com/ and find
something about what I'm looking for.

-Ross.

On Thu, Nov 19, 2009 at 12:18 PM, Han, Yan h...@u.library.arizona.edu wrote:
 Please explain in more details, that will be more helpful.
 It has been a while. Back to 2007, I checked PURL's architecture, and it was 
 straightly handling web addresses only. Of course, current HTTP protocol is 
 not going to last forever, and there are other protocols in the Internet. The 
 coverage of PURL is not enough.
 From PURL's website, it still says  PURLs (Persistent Uniform Resource 
 Locators) are Web addresses that act as permanent identifiers in the face of 
 a dynamic and changing Web infrastructure. I am not sure what web 
 addresses means.  http://www.purl.org/docs/help.html#overview says  PURLs 
 are Persistent Uniform Resource Locators (URLs). A URL is simply an address 
 on the World Wide Web. We all know that World Wide Web is not the 
 Internet. What if info resource can be accessed through other Internet 
 Protocols (FTP, VOIP, )?  This is the limitation of PURL.
 PURL is doing re-architecture, though I cannot find out more documentation.
 The Handle system is  The Handle System is a general purpose distributed 
 information system that provides efficient, extensible, and secure HDL 
 identifier and resolution services for use on networks such as the 
 Internet.. http://www.handle.net/index.html Notice the difference in 
 definition.

 Yan


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross 
 Singer
 Sent: Wednesday, November 18, 2009 8:11 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Assigning DOI for local content

 On Wed, Nov 18, 2009 at 12:19 PM, Han, Yan h...@u.library.arizona.edu wrote:
 Currently DOI uses Handle (technology) with it social framework (i.e. 
 administrative body to manage DOI). In technical sense, PURL is not going to 
 last long.

 I'm not entirely sure what this is supposed to mean (re: purl), but
 I'm pretty sure it's not true.

 I'm also pretty sure there's little to no direct connection between
 purl and doi despite a superficial similarity in scope.

 -Ross.

Re: [CODE4LIB] Assigning DOI for local content

2009-11-18 Thread Ross Singer

On Wed, Nov 18, 2009 at 12:19 PM, Han, Yan h...@u.library.arizona.edu wrote:
 Currently DOI uses Handle (technology) with it social framework (i.e. 
 administrative body to manage DOI). In technical sense, PURL is not going to 
 last long.

I'm not entirely sure what this is supposed to mean (re: purl), but
I'm pretty sure it's not true.

I'm also pretty sure there's little to no direct connection between
purl and doi despite a superficial similarity in scope.

-Ross.

Re: [CODE4LIB] holdings standards/protocols

2009-11-16 Thread Ross Singer

On Mon, Nov 16, 2009 at 9:58 AM, Chris Keene c.j.ke...@sussex.ac.uk wrote:

 Looks like our Talis system can't using the same process :(

No, holdings aren't exported to Zebra.

That being said, the opacxml format could be pretty easily added to
the jangle connector.  There's also something similar (well, sort of)
in Keystone.

What exactly are you looking for?  Does this functionality work with
AquaBrowser implementations on Voyager or III?

I guess what I'm asking is, is the Z39.50 holdings format exactly what
you want, or would there be a more ideal format to use?  The opac
format gets pretty gnarly with serials, for example (of course,
everything does).

-Ross.

Re: [CODE4LIB] MARC8 in marc-ruby

2009-11-02 Thread Ross Singer

ruby-marc does not have any capacity to convert MARC-8 in any ruby
interpreter: MRI, JRuby, Rubinius, whatever.

Given the amount of work required to include this (unless Mark
Matienzo feels like hacking into ruby-marc what he did for pymarc), I
think I'd need to see a really compelling need (that can't be solved
by one of the options that Ed mentioned) before making this much of a
priority.

-Ross.

On Mon, Nov 2, 2009 at 11:42 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
I thought that marc-ruby did MARC8 already! Wait, does it just do it in
'native' ruby interpreter, but not in jruby?

I'm dealing with records in MARC8 now I think with marc-ruby, and it looked
like the non-roman characters were coming accross okay! I might need to go
investigate my setup further now

Jonathan

Ed Summers wrote:

Hi Brendan:

Ahh the lovely MARC-8 :-)

It's a fair bit of effort I think. One approach could be to porting
the MARC8-Unicode functionality from pymarc [1,2]. It's only one-way,
but that's normally what most sane people want to do anyhow.

Another approach would be to look into wrapping yaz-iconv [3] from
IndexData which provides much more (and faster) MARC related character
mapping facilities.

If you just want to get something done without extending ruby-marc you
can pre-process your data with yaz-marcdump and then throw it at
ruby-marc. Or perhaps if you are in jruby-land you could use marc4j
which has MARC-8 support.

I've cc'ed code4lib since someone else might have some better ideas.
Thanks for writing.

//Ed

[1]
http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8.py
[2]
http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8_mapping.py
[3] http://www.indexdata.com/yaz/doc/yaz-iconv.html
[4] http://marc4j.tigris.org/

On Fri, Oct 30, 2009 at 3:22 AM, Brendan Boesen bboe...@nla.gov.au
wrote:

Hi Guys,

I guess this is the 'bug the authors if you need it' email.

I'm trying to parse a MARC record and it contains Chinese characters.
From
the leader:
01051cam 2200265 a 4504
it looks like the record uses MARC8 encoding.

I'm investigating a way to get a Unicode encoded one but that may not
work
out. What sort of effort do you think is involved in adding MARC8
support
into marc-ruby? (And is there anything I could do to help with that?)

Regards,

Brendan Boesen
National Library of Australia

Re: [CODE4LIB] Greenstone: tweaking Lucene indexing

2009-09-29 Thread Ross Singer

Yitzchak, are you interested in actually searching the fulltext? Or just
highlighting the terms?

If you're only interested in highlighting it, it might be a whole lot easier
to implement this in javascript through something like jQuery:

http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html

That way you're not juggling mostly redundant Lucene indexes and trying to
keep them synced.

How are you getting your search results? Does Greenstone have some sort of
search API that returns the highlighted results? Would it make a difference
if you could add a field to the Lucene document (meaning would you have
access to it through your PHP API to Greenstone)? If so, you could probably
do this pretty easily via one of the JVM scripting languages (Groovy, JRuby,
Jython, Quercus -- PHP in the JVM) so you just have the single Lucene index
instead of multiple.

Another approach might be to serve the Lucene index via Solr [1] or
Lucene-WS (http://lucene-ws.net/) which would allow you to skip Greenstone
altogether for searching.

Basically, I would try to avoid going the Zend_Lucene route if at all
possible.

-Ross.

1.
http://www.google.com/search?q=solr+on+an+existing+lucene+indexie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:officialclient=firefox-a

On Tue, Sep 29, 2009 at 11:32 AM, Yitzchak Schaffer
yitzchak.schaf...@gmx.com wrote:

Erik Hatcher wrote:

I'm a bit confused then. You mentioned that somehow Zend Lucene was going
to help, but if you don't have the text to highlight anywhere then the
Highlighter isn't going to be of any use. Again, you don't need the full
text in the Lucene index, but you do need it get it from somewhere in order
to be able to highlight it.

Erik,

I started to port the native Greenstone Java Lucene wrapper to PHP, so I
could then modify it to add this feature, as I don't know Java. This would
mean using Zend Lucene for the actual indexing implementation. My question
is whether anyone's already done it, in Java or otherwise.

Thanks for the clarification,

--
Yitzchak Schaffer
Systems Manager
Touro College Libraries
33 West 23rd Street
New York, NY 10010
Tel (212) 463-0400 x5230
Fax (212) 627-3197
Email yitzchak.schaf...@gmx.com

Re: [CODE4LIB] A few Ruby MARC announcements

2009-09-24 Thread Ross Singer

Thanks for pointing that out, Ed.  Since, of course, the only thing
worse than lies and damn lies are, as we know, statistics, let me give
some context here :)

These benchmarks were run on a 45MB marcxml document with a little
less than 17k records in it that I happened to have on my machine.

Hopefully that helps clear up the numbers a bit (although probably).

I can definitely say that process was not entirely scientific -- it
was run on my work machine, during work hours with other work-related
applications running.  But I ran them a couple of time each and they
are pretty representative of the average.

Thanks Ed (and also Kevin Clarke and Will Groppe) for making rubymarc
in the first place and thanks to Jonathan Rochkind and Bill Dueber for
helping flesh out how these pluggable parsers/serializers should work.

-Ross.

On Thu, Sep 24, 2009 at 12:48 AM, Ed Summers e...@pobox.com wrote:
 Nice work Ross! Users of rubymarc might like to see the performance
 enhancements that motivated you to do the nokogiri integration:

  http://paste.lisp.org/display/87529

 !!!

 //Ed

 On Wed, Sep 23, 2009 at 10:51 PM, Ross Singer rossfsin...@gmail.com wrote:
 Hi everybody,

 Apologies for the crossposting.

 I wanted to let people know that Ruby MARC 0.3.0 was just released as
 a gem.  This version addresses the biggest complaint about Ruby MARC,
 which was the fact that it could only parse MARCXML with REXML, Ruby's
 native XML parser (which, if you've used it, you hate it).

 Now you can use Nokogiri (http://nokogiri.rubyforge.org/) or, if
 you're using JRuby, jrexml instead of REXML, if you want.

 This release *shouldn't* break anything.

 The rubyforge project is here:
 http://rubyforge.org/projects/marc
 The rdocs are here:
 http://marc.rubyforge.org/
 The source is here:
 http://marc.rubyforge.org/svn/

 To install:
 sudo gem install marc

 While I'm making MARC and Ruby related announcements, I'd like to
 point out a project I released a couple of weeks ago that sits on top
 of Ruby MARC, called enhanced-marc.  It's basically a domain specific
 language for working with the MARC fixed fields and providing a set of
 objects and methods to more easily parse what the record is
 describing.

 For example:

  require 'enhanced_marc'

  reader = MARC::Reader.new('marc.dat')

  records = []

  reader.each do | record |
    records  record
  end

   records[0].class
  = MARC::BookRecord

   records[0].is_conference?
  = false

   records[0].is_manuscript?
  = false

  # Send a boolean true if you want human readable forms, rather than
 MARC codes.
   records[0].literary_form(true)
  = Non-fiction

   records[0].nature_of_contents(true)
  = [Bibliography, Catalog]

   records[1].class
  = MARC::SoundRecord

   records[1].composition_form(true)
  = Jazz

   records[2].class
  = MARC::MapRecord

   records[2].projection(true)
  = [Cylindrical, Mercator]

   records[2].relief(true)
  = [Color]

 The enhanced-marc project is here:
 http://github.com/rsinger/enhanced-marc

 To install it:
  gem sources -a http://gems.github.com
  sudo gem install rsinger-enhanced_marc

 Let me know if you have any problems or suggestions with either of these.

 Thanks!
 -Ross.

[CODE4LIB] A few Ruby MARC announcements

2009-09-23 Thread Ross Singer

Hi everybody,

Apologies for the crossposting.

I wanted to let people know that Ruby MARC 0.3.0 was just released as
a gem.  This version addresses the biggest complaint about Ruby MARC,
which was the fact that it could only parse MARCXML with REXML, Ruby's
native XML parser (which, if you've used it, you hate it).

Now you can use Nokogiri (http://nokogiri.rubyforge.org/) or, if
you're using JRuby, jrexml instead of REXML, if you want.

This release *shouldn't* break anything.

The rubyforge project is here:
http://rubyforge.org/projects/marc
The rdocs are here:
http://marc.rubyforge.org/
The source is here:
http://marc.rubyforge.org/svn/

To install:
sudo gem install marc

While I'm making MARC and Ruby related announcements, I'd like to
point out a project I released a couple of weeks ago that sits on top
of Ruby MARC, called enhanced-marc.  It's basically a domain specific
language for working with the MARC fixed fields and providing a set of
objects and methods to more easily parse what the record is
describing.

For example:

  require 'enhanced_marc'

  reader = MARC::Reader.new('marc.dat')

  records = []

  reader.each do | record |
records  record
  end

   records[0].class
  = MARC::BookRecord

   records[0].is_conference?
  = false

   records[0].is_manuscript?
  = false

  # Send a boolean true if you want human readable forms, rather than
MARC codes.
   records[0].literary_form(true)
  = Non-fiction

   records[0].nature_of_contents(true)
  = [Bibliography, Catalog]

   records[1].class
  = MARC::SoundRecord

   records[1].composition_form(true)
  = Jazz

   records[2].class
  = MARC::MapRecord

   records[2].projection(true)
  = [Cylindrical, Mercator]

   records[2].relief(true)
  = [Color]

The enhanced-marc project is here:
http://github.com/rsinger/enhanced-marc

To install it:
  gem sources -a http://gems.github.com
  sudo gem install rsinger-enhanced_marc

Let me know if you have any problems or suggestions with either of these.

Thanks!
-Ross.

Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-15 Thread Ross Singer

Owen, I might have missed it in this message -- my eyes are starting
glaze over at this point in the thread, but can you describe how the
input of these resources would work?

What I'm basically asking is -- what would the professor need to do to
add a new:  citation for a 70 year old book; journal on PubMed; URL to
CiteSeer?

How does their input make it into your database?

-Ross.

On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens o.steph...@open.ac.uk wrote:
True. How, from the OpenURL, are you going to know that the rft is meant
to represent a website?
 I guess that was part of my question. But no one has suggested defining a new 
 metadata profile for websites (which I probably would avoid tbh). DC doesn't 
 seem to offer a nice way of doing this (that is saying 'this is a website'), 
 although there are perhaps some bits and pieces (format, type) that could be 
 used to give some indication (but I suspect not unambiguously)

But I still think what you want is simply a purl server. What makes you
think you want OpenURL in the first place?  But I still don't really
understand what you're trying to do: deliver consistency of approach
across all our references -- so are you using OpenURL for it's more
conventional use too, but you want to tack on a purl-like
functionality to the same software that's doing something more like a
conventional link resolver?  I don't completely understand your use case.

 I wouldn't use OpenURL just to get a persistent URL - I'd almost certainly 
 look at PURL for this. But, I want something slightly different. I want our 
 course authors to be able to use whatever URL they know for a resource, but 
 still try to ensure that the link works persistently over time. I don't think 
 it is reasonable for a user to have to know a 'special' URL for a resource - 
 and this approach means establishing a PURL for all resources used in our 
 teaching material whether or not it moves in the future - which is an 
 overhead it would be nice to avoid.

 You can hit delete now if you aren't interested, but ...

 ... perhaps if I just say a little more about the project I'm working on it 
 may clarify...

 The project I'm working on is concerned with referencing and citation. We are 
 looking at how references appear in teaching material (esp. online) and how 
 they can be reused by students in their personal environment (in essays, 
 later study, or something else). The references that appear can be to 
 anything - books, chapters, journals, articles, etc. Increasingly of course 
 there are references to web-based materials.

 For print material, references generally describe the resource and nothing 
 more, but for digital material references are expected not only to describe 
 the resource, but also state a route of access to the resource. This tends to 
 be a bad idea when (for example) referencing e-journals, as we know the 
 problems that surround this - many different routes of access to the same 
 item. OpenURLs work well in this situation and seem to me like a sensible 
 (and perhaps the only viable) solution. So we can say that for 
 journals/articles it is sensible to ignore any URL supplied as part of the 
 reference, and to form an OpenURL instead. If there is a DOI in the reference 
 (which is increasingly common) then that can be used to form a URL using DOI 
 resolution, but it makes more sense to me to hand this off to another 
 application rather than bake this into the reference - and OpenURL resolvers 
 are reasonably set to do this.

 If we look at a website it is pretty difficult to reference it without 
 including the URL - it seems to be the only good way of describing what you 
 are actually talking about (how many people think of websites by 'title', 
 'author' and 'publisher'?). For me, this leads to an immediate confusion 
 between the description of the resource and the route of access to it. So, to 
 differentiate I'm starting to think of the http URI in a reference like this 
 as a URI, but not necessarily a URL. We then need some mechanism to check, 
 given a URI, what is the URL.

 Now I could do this with a script - just pass the URI to a script that checks 
 what URL to use against a list and redirects the user if necessary. On this 
 point Jonathan said if the usefulness of your technique does NOT count on 
 being inter-operable with existing link resolver infrastructure... PERSONALLY 
 I would be using OpenURL, I don't think it's worth it - but it struck me 
 that if we were passing a URI to a script, why not pass it in an OpenURL? I 
 could see a number of advantages to this in the local context:

 Consistency - references to websites get treated the same as references to 
 journal articles - this means a single approach on the course side, with 
 flexibility
 Usage stats - we could collect these whatever, but if we do it via OpenURL we 
 get this in the same place as the stats about usage of other scholarly 
 material and could consider driving

Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-15 Thread Ross Singer

Given that the burden of creating these links is entirely on RefWorks
 Telstar, OpenURL seems as good a choice as anything (since anything
would require some other service, anyway).  As long as the profs
aren't expected to mess with it, I'm not sure that *how* you do the
indirection matters all that much and, as you say, there are added
bonuses to keeping it within SFX.

It seems to me, though, that your rft_id should be a URI to the db
you're using to store their references, so your CTX would look
something like:

http://res.open.ac.uk/?rfr_id=info:/telstar.open.ac.ukrft_id=http://telstar.open.ac.uk/1234dc.identifier=http://bbc.uk.co/
# not url encoded because I have, you know, a life.

I can't remember if you can include both metadata-by-reference keys
and metadata-by-value, but you could have by-reference
(rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something)
point at your citation db to return a formatted citation.

This way your citations are unique -- somebody pointing at today's
London Times frontpage isn't the same as somebody else's on a
different day.

While I'm shocked that I agree with using OpenURL for this, it seems
as reasonable as any other solution.  That being said, unless you can
definitely offer some other service besides linking to the resource,
I'd avoid the resolver menu completely.

-Ross.

On Tue, Sep 15, 2009 at 11:17 AM, O.Stephens o.steph...@open.ac.uk wrote:
 Ross - no you didn't miss it,

 There are 3 ways that references might be added to the learning environment:

 An author (or realistically a proxy on behalf of the author) can insert a 
 reference into a structured Word document from an RIS file. This structured 
 document (XML) then goes through a 'publication' process which pushes the 
 content to the learning environment (Moodle), including rendering the 
 references from RIS format into a specified style, with links.
 An author/librarian/other can import references to a 'resources' area in our 
 learning environment (Moodle) from a RIS file
 An author/librarian/other can subscribe to an RSS feed from a RefWorks 
 'RefShare' folder within the 'resources' area of the learning environment

 In general the project is focussing on the use of RefWorks - so although the 
 RIS files could be created by any suitable s/w, we are looking specifically 
 at RefWorks.

 How you get the reference into RefWorks is something we are looking at 
 currently. The best approach varies depending on the type of material you are 
 looking at:

 For websites it looks like the 'RefGrab-it' bookmarklet/browser plugin 
 (depending on your browser) is the easiest way of capturing website details.
 For books, probably a Union catalogue search from within RefWorks
 For journal articles, probably a Federated search engine (SS 360 is what 
 we've got)
 Any of these could be entered by hand of course, as could several other kinds 
 of reference

 Entering the references into RefWorks could be done by an author, but it more 
 likely to be done by a member of clerical staff or a librarian/library 
 assistant

 Owen

 Owen Stephens
 TELSTAR Project Manager
 Library and Learning Resources Centre
 The Open University
 Walton Hall
 Milton Keynes, MK7 6AA

 T: +44 (0) 1908 858701
 F: +44 (0) 1908 653571
 E: o.steph...@open.ac.uk


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On
 Behalf Of Ross Singer
 Sent: 15 September 2009 15:56
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Implementing OpenURL for simple web resources

 Owen, I might have missed it in this message -- my eyes are
 starting glaze over at this point in the thread, but can you
 describe how the input of these resources would work?

 What I'm basically asking is -- what would the professor need
 to do to add a new:  citation for a 70 year old book; journal
 on PubMed; URL to CiteSeer?

 How does their input make it into your database?

 -Ross.

 On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens
 o.steph...@open.ac.uk wrote:
 True. How, from the OpenURL, are you going to know that the rft is
 meant to represent a website?
  I guess that was part of my question. But no one has suggested
  defining a new metadata profile for websites (which I
 probably would
  avoid tbh). DC doesn't seem to offer a nice way of doing
 this (that is
  saying 'this is a website'), although there are perhaps
 some bits and
  pieces (format, type) that could be used to give some
 indication (but
  I suspect not unambiguously)
 
 But I still think what you want is simply a purl server. What makes
 you think you want OpenURL in the first place?  But I still don't
 really understand what you're trying to do: deliver consistency of
 approach across all our references -- so are you using OpenURL for
 it's more conventional use too, but you want to tack on a
 purl-like
 functionality to the same software that's doing something
 more like a
 conventional link resolver?  I don't completely understand
 your use case

Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-15 Thread Ross Singer

Oh yeah, one thing I left off --

In Moodle, it would probably make sense to link to the URL in the a tag:
a href=http://bbc.co.uk/;The Beeb!/a
but use a javascript onMouseDown action to rewrite the link to route
through your funky link resolver path, a la Google.

That way, the page works like any normal webpage, right mouse
click-Copy Link Location gives the user the real URL to copy and
paste, but normal behavior funnels through the link resolver.

-Ross.

On Tue, Sep 15, 2009 at 11:41 AM, Ross Singer rossfsin...@gmail.com wrote:
 Given that the burden of creating these links is entirely on RefWorks
  Telstar, OpenURL seems as good a choice as anything (since anything
 would require some other service, anyway).  As long as the profs
 aren't expected to mess with it, I'm not sure that *how* you do the
 indirection matters all that much and, as you say, there are added
 bonuses to keeping it within SFX.

 It seems to me, though, that your rft_id should be a URI to the db
 you're using to store their references, so your CTX would look
 something like:

 http://res.open.ac.uk/?rfr_id=info:/telstar.open.ac.ukrft_id=http://telstar.open.ac.uk/1234dc.identifier=http://bbc.uk.co/
 # not url encoded because I have, you know, a life.

 I can't remember if you can include both metadata-by-reference keys
 and metadata-by-value, but you could have by-reference
 (rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something)
 point at your citation db to return a formatted citation.

 This way your citations are unique -- somebody pointing at today's
 London Times frontpage isn't the same as somebody else's on a
 different day.

 While I'm shocked that I agree with using OpenURL for this, it seems
 as reasonable as any other solution.  That being said, unless you can
 definitely offer some other service besides linking to the resource,
 I'd avoid the resolver menu completely.

 -Ross.

 On Tue, Sep 15, 2009 at 11:17 AM, O.Stephens o.steph...@open.ac.uk wrote:
 Ross - no you didn't miss it,

 There are 3 ways that references might be added to the learning environment:

 An author (or realistically a proxy on behalf of the author) can insert a 
 reference into a structured Word document from an RIS file. This structured 
 document (XML) then goes through a 'publication' process which pushes the 
 content to the learning environment (Moodle), including rendering the 
 references from RIS format into a specified style, with links.
 An author/librarian/other can import references to a 'resources' area in our 
 learning environment (Moodle) from a RIS file
 An author/librarian/other can subscribe to an RSS feed from a RefWorks 
 'RefShare' folder within the 'resources' area of the learning environment

 In general the project is focussing on the use of RefWorks - so although the 
 RIS files could be created by any suitable s/w, we are looking specifically 
 at RefWorks.

 How you get the reference into RefWorks is something we are looking at 
 currently. The best approach varies depending on the type of material you 
 are looking at:

 For websites it looks like the 'RefGrab-it' bookmarklet/browser plugin 
 (depending on your browser) is the easiest way of capturing website details.
 For books, probably a Union catalogue search from within RefWorks
 For journal articles, probably a Federated search engine (SS 360 is what 
 we've got)
 Any of these could be entered by hand of course, as could several other 
 kinds of reference

 Entering the references into RefWorks could be done by an author, but it 
 more likely to be done by a member of clerical staff or a librarian/library 
 assistant

 Owen

 Owen Stephens
 TELSTAR Project Manager
 Library and Learning Resources Centre
 The Open University
 Walton Hall
 Milton Keynes, MK7 6AA

 T: +44 (0) 1908 858701
 F: +44 (0) 1908 653571
 E: o.steph...@open.ac.uk


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On
 Behalf Of Ross Singer
 Sent: 15 September 2009 15:56
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Implementing OpenURL for simple web resources

 Owen, I might have missed it in this message -- my eyes are
 starting glaze over at this point in the thread, but can you
 describe how the input of these resources would work?

 What I'm basically asking is -- what would the professor need
 to do to add a new:  citation for a 70 year old book; journal
 on PubMed; URL to CiteSeer?

 How does their input make it into your database?

 -Ross.

 On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens
 o.steph...@open.ac.uk wrote:
 True. How, from the OpenURL, are you going to know that the rft is
 meant to represent a website?
  I guess that was part of my question. But no one has suggested
  defining a new metadata profile for websites (which I
 probably would
  avoid tbh). DC doesn't seem to offer a nice way of doing
 this (that is
  saying 'this is a website'), although there are perhaps
 some bits and
  pieces (format, type) that could

Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-15 Thread Ross Singer

On Tue, Sep 15, 2009 at 12:06 PM, Eric Hellman e...@hellman.net wrote:
 Yes, you can.


In this case, I say punt on dc.identifier, throw the URL in rft_id
(since, Eric, you had some concern regarding using the local id for
this?) and let the real URL persistence/resolution work happen with
the by-ref negotiation.

-Ross.

 On Sep 15, 2009, at 11:41 AM, Ross Singer wrote:

 I can't remember if you can include both metadata-by-reference keys
 and metadata-by-value, but you could have by-reference
 (rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something)
 point at your citation db to return a formatted citation.

 Eric Hellman
 President, Gluejar, Inc.
 41 Watchung Plaza, #132
 Montclair, NJ 07042
 USA

 e...@hellman.net
 http://go-to-hellman.blogspot.com/

Re: [CODE4LIB] FW: PURL Server Update 2

2009-09-01 Thread Ross Singer

On Tue, Sep 1, 2009 at 7:51 PM, Edward M. Corradoecorr...@ecorrado.us wrote:

 Thus I have to believe them that they did not have a compromised
 server and instead they had a hardware failure. I have no idea why
 they couldn't just restore from backup which would at least gotten
 them back to where they were from the last backup (which presumably
 was at most a week ago, if not someone should have a lot of explaining
 to do to someone).

I didn't want to join this speculation party, but here goes.

It's quite possible that part of the problem here is that the
significant hardware failure meant that the replacement was a
completely different architecture (let's say for argument's sakes that
the server that failed was AS/400 and the replacement was Solaris on
an Intel server) because IT policy (or, you know, reality) dictated
that the old hardware would be replaced if it failed.

So then we're not just talking about backing up from tape -- things
need to be compiled -- there are perhaps problems with legacy C
libraries, character sets, *whatever*.

When I was working at Emory, we had a grant funded project that
indexed a handful of collections of SGML EAD files in an app called
iSearch (http://www.etymon.com/tr.html#).  When the (admittedly
neglected) VA Linux server it ran on had a major problem it was
insanely non-trivial to get this completely orphaned application
running in a contemporary operating system (in this case, RedHat).
Old versions of iSearch /would not under any circumstances/ compile --
new ones couldn't read the old data.  The application was down for --
I don't know -- months, IIRC.  Granted, this was nowhere near the
priority of GPO's PURL server -- but you can't stop time to solve
these sorts of Catch-22s, either.

Things happen.  Catastrophes generally have the added advantage of
ensuring they don't happen again for a while.

-Ross.

Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards

2009-08-12 Thread Ross Singer

On Wed, Aug 12, 2009 at 10:48 AM, Karen Coyleli...@kcoyle.net wrote:
 Ross Singer wrote:

 3) What, specifically, is missing from DCTerms that would make a MODS
 ontology needed?  What, specifically, is missing from Bibliontology or
 MusicOntology or FOAF or SKOS, etc. that justifies a new and, in many
 places, overlapping vocabulary?  Would time be better spent trying to
 improve the existing vocabularies?


 MARC: 182 fields, 1711 subfields, 2401 fixed field values
 DC: 59 properties

I see where you're going with this, but I'm not sure it's a fair
critique.  It's sort of on par with saying that a Dodge Grand Caravan
is a more sophisticated vehicle than a Mini Cooper because it has more
horsepower, 3 times as many cup holders and vastly more cubic footage
in the interior.  A Caravan /may/ be a more sophisticated vehicle, but
I'm not sure a quick run over the specs can necessarily reveal that.

One of the problems here is that it doesn't begin to address the DCAM
-- these are 59 properties that can be reused among 22 classes, giving
them different semantic meaning.

 Look at the sample records in MARCXML and DC at
 http://www.loc.gov/standards/marcxml and you will see how lossy it is.

Now I think you know you're being a little misleading here.  For one
thing, it's using DC Elements and it's not doing /anything/ vaguely
RDF-related.  Unfortunately, I think it's examples like this that have
led libraries to write DC off as next to worthless (and
understandably!).

Dublin Core is toothless and practically worthless in XML form.  It is
considerably more powerful when used in RDF, however, because they
play to their mutual strengths, namely that in RDF, you generally
don't use a schema in isolation.

Now,
 you could argue that no one needs all of the detail in MARC, and I'm sure it
 could be reduced down to something more rational, plus there is redundancy
 in it, but for pity's sake, DC doesn't have a way to indicate the EDITION of
 a work.

This is true.  But this is also why I'm asking what is missing in
DCTerms that would be available in MODS -- The win of RDF is that
you aren't contrained by the limits of a particular schema.  If a
particular vocabulary gets you a fair ways towards representing your
resource, but something is missing, it's perfectly reasonable (and
expected) to plug in other vocabularies to fill in the gaps.

For example, SKOS doesn't need to add coordinate properties to
properly define locations.  Instead, you pull in a vocabulary that is
optimized for defining geographic place (say, wgs_84) and rather than
suboptimally retrofit a vocabulary designed for modeling thesauri, use
one that is explicitly intended to model the resource at hand (and,
preferably, only that).

I think it's somewhat analogous to the notion of domain-specific
languages:  there's an abstraction between the resource and the most
efficient way to access it.

 FOAF has both *surname* and *family name* and says: These are not
 current stable or consistent... No sh*t. And try to clearly code a name
 like Pope John Paul II in FOAF. Oh, and death dates. No death dates in
 FOAF because you wouldn't have DEAD FRIENDS. But authors die.


FOAF isn't the only vocabulary available to model people and I'm
hardly saying it's the answer here.  I mean, MARC is complicated in
this regard, too.  Rodrigo Jimenez Hernandez Garcia  Liu Ming
Chung.  Names are hard.  I think pretty much any schema is going to
have to have rules and conventions to compensate for the variability
of how different cultures prescribe identity.

Maybe vCard would be better (maybe not).  The Bio vocabulary might be
a better option for defining biographical events (birth, death,
etc.).  It lacks some of the attributes that libraries use
(flourishing dates, for example) and shares the disadvantage inherent
in RDF that RDF can't express inexact dates very well.

I think a common misperception of RDF in library circles is that there
is no vocabulary that does everything we need.  Rather, I think that
this is one of RDF's strength: no vocabulary can successfully model
the universe, so, instead, focus on the specifics.  The library world
instead takes the opposite approach, which tends to cause things to
get shoehorned in to meet the shape of the model rather than be
expressed in a way more naturally suited to the resource.

-Ross.

Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards

2009-08-12 Thread Ross Singer

On Wed, Aug 12, 2009 at 1:45 PM, Karen Coyleli...@kcoyle.net wrote:
Ross Singer wrote:

One of the problems here is that it doesn't begin to address the DCAM
-- these are 59 properties that can be reused among 22 classes, giving
them different semantic meaning.

Uh, no. That's the opposite of what the DC terms are about. Each term has a
defined range -- so the defined range of creator is Agent. It can only be
used as an Agent. You don't mix and match, and you don't assign different
semantics to the same property under different circumstances.

Jason clarified what I meant much better than I did, but I will take
this a step further -- the DC properties have ranges, but only 5 have
a constraint on their domain. So while dct:creator has to point at a
dct:Agent (or some equivalent), where the dct:creator property lives
can be anything.

dct:Location about=#RhodeIsland
dct:titleRhode Island/dct:title
dct:creator
dct:Agent about=#RogerWilliams
dct:titleRoger Williams/dct:title
dct:creator
dct:Agent about=#JamesWilliams
dct:titleJames Williams/dct:title
/dct:Agent
dct:Agent about=#AliceWilliams
dct:titleAlice Williams/dct:title
/dct:Agent
/dct:creator
/dct:Agent
/dct:creator
/dct:Location

The definition of dct:title is: A name given to the resource. So
dct:title could be re: [CODE4LIB] MARC/MODS and Automating Migration
to Linked-Data Standards, Ross Singer or Chattanooga, TN
depending on what resource we're talking about. Maybe semantics is
a poor word choice, but I think Ross Singer as a title of an Agent
resource or Chattanooga, TN as the title of a Location resource
have some conceptual distinctions to For Whom the Bell Tolls.

The creation of Rhode Island also carries a different mental image
than the creation of Roger Williams.

It seems like context influences semantics, at least somewhat.

Dublin Core is toothless and practically worthless in XML form. It is
considerably more powerful when used in RDF, however, because they
play to their mutual strengths, namely that in RDF, you generally
don't use a schema in isolation.

The elements in Dublin Core are the elements in Dublin Core. The
serialization shouldn't really matter. But if you need to distinguish
between title and subtitle, Dublin Core's http://purl.org/dc/terms/title
doesn't work. What matters is the actual *meaning* of the term, and the
degree of precision you need. You can't use http://purl.org/dc/terms/title
for Mr. or Dr. in a name -- it has a particular meaning. And you can't
use it for title Proper as defined in library cataloging, because it
doesn't have that meaning. It all depends on what you are trying to say.

Again, Jason did a good job at explaining the difference. Dublin Core
in XML (at least in every example I've ever seen) consists solely of
literals. The values are text, not resources -- so in XML DC, not
only would you be unable to attach, say, birth and death date
properties to Roger Williams, you also wouldn't be able to say who his
creators are.

Going back to context defining semantics, I don't think it's
unreasonable to say that dct:title does mean title distinct from
subtitle if that's the expectation of how dct:title is to work within
your vocabulary/class.

ex:Book about=http://example.org/ex/1234;
dct:titleZen and the Art of Motorcycle Maintenance/dct:title
ex:subTitleAn Inquiry into Values/ex:subTitle
/ex:Book

The definition of dct:title is pretty ambiguous -- alternately, you
might choose to use dct:title to contain the full title and define
some other property for main title.

Just because title doesn't have a clear definition doesn't mean rules
can't be applied towards it when used in a particular domain (assuming
they conform to 'the name of the resource').

This is true. But this is also why I'm asking what is missing in
DCTerms that would be available in MODS -- The win of RDF is that
you aren't contrained by the limits of a particular schema. If a
particular vocabulary gets you a fair ways towards representing your
resource, but something is missing, it's perfectly reasonable (and
expected) to plug in other vocabularies to fill in the gaps.

Exactly. But the range of available vocabularies today is quite limited.
There are a lot of semantics that are used in libraries that I can't find in
the available vocabularies. Eventually I think we will have what we need,
but ... well, yesterday I was hunting all over for a data element for price.
And the person who needed it didn't want to get into the complexity of ONIX.
BIBO doesn't have it. DC doesn't have it. RDA doesn't have it. Something
that simple.

Well, Bibo doesn't have it because it has nothing to do with citations.

GoodRelations (http://www.heppnetz.de/projects/goodrelations/) does,
but I admit that it's usage seems rather baroque:

http://www4.wiwiss.fu-berlin.de/bookmashup/doc/offers/0596000278googleOffer6997796095130913016

Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards

2009-08-12 Thread Ross Singer

Whew -- just hit discard on my last message.

On Wed, Aug 12, 2009 at 9:07 PM, Karen Coyleli...@kcoyle.net wrote:

 then my question is: has B changed? In other words, is B of class X the
same
 as B of class Y? (Assuming that both B's have the same URI.).

B (for our purposes we'll say it's http://example.org/ex/B;) can claim
it's of as many types as the assertor is willing to predicate (making up
words all over this place) as long as none of the classes anywhere assert
that they owl:disjointWith (or some similar != assertion) another adopted
type.

So:
rdf:Description about=http://example.org/ex/B;
 rdf:type resource=http://vocab.org/frbr/core#Manifestation; /
 dct:titleZen and the Art of the Motorcycle Maintenance/dct:title
 rdf:type resource=http://purl.org/ontology/bibo/Book /
 bibo:isbn100553277472/bibo:isbn10
/rdf:Description

Ok -- everything's still in the clear.  We've asserted that this resource is
a book and that, in FRBR terms, it's also a manifestation.  Both of these
assertions are true but they're talking about the same resource in different
vocabularies -- basically they describe the same thing in different world
views:  the FRBR model has no knowledge (nor need for knowledge) of
bibliographic metadata and vice versa.

Now, should you append to this graph something like:
 rdf:type resource=http://vocab.org/frbr/core#Text; /

you've run aground.  The FRBR schema claims that by being a Text (of course
it makes no mention of what exactly that means) it implies also being an
Expression but it also defines that frbr:Expression owl:disjointWith
frbr:Manifestation  (and vice-versa):  that is, your resource can't be
both an Expression and Manifestation, which makes sense.

Now, this doesn't mean that Books and Manifestations are the same thing,
it's just that /this/ book also happens to be a manifestation.

As far your point about context goes, I think this comes down to trust,
credibility and provenance.  Even if you define special properties to
contain specific parts of your data, there is no way to enforce it.  For
example, let's say our new RDA vocabulary has:
  rda:titleProper
  rda:remainderofTitle

and all ILMSes move to an RDA/RDF model (I mean, yes, we're wandering into
fantasyland, just bear with me) and begin to store our resources using this
as the main data model.

Now let's say we have a stash of data we'd like to add to our collection:
maybe it's an e-book collection or a set of aggregated OA e-journals, a la
DOAJ.  The providers of this data are told we need it in our new RDA format
and they comply.  Let's say, though, that they weren't discriminate enough
to distinguish the titleProper from the remainderOfTitle internally but in
an effort to comply with our request, put their string in rda:titleProper
(it's got to go somewhere, after all) and call it a day.  Uncertainty has
crept into the mix.

After all, there's nothing, technically, stopping me from entering Zen and
the art of motorcycle maintenance: an inquiry into values all in the 245$a.

I think that replacing dct:title with rda:titleProper (rather than declaring
that when used in RDA, dct:title should be the proper title) won't
drastically help the purity of our data (especially if one of the
motivations of RDA and RDF is the promise of externally supplied data) and
will have the consequence of being in a vocabulary off the radar for anybody
not in a library (and therefore ignored).

It's a tough call, though.

-Ross.

Re: [CODE4LIB] [Fwd: [ol-tech] Modified RDF/XML api]

2009-08-11 Thread Ross Singer

Karen,

The Bio vocabulary might help with the birth/death dates:
http://vocab.org/bio/0.1/.html

And foaf:isPrimaryTopicOf

http://xmlns.com/foaf/spec/#term_isPrimaryTopicOf

might be a good way to relate to the wikipedia page.

I don't have any recommendation for alternate names (and would be
interested in knowing of any, myself).

All this isn't to discourage using the RDA vocabulary for any of this,
but my concern is that its complexity, lack of documentation and
kitchen sink approach will be daunting, especially for people coming
from outside the library domain.

I sort of look at RDA as the ontology of last resort.

-Ross.

On Tue, Aug 11, 2009 at 12:24 PM, Karen Coyleli...@kcoyle.net wrote:
 OK! thanks. There must be some default operating there... RDF for authors is
 now on to do list! Here are the data elements available:

 name
 alternate names
 website
 birth date
 death date
 wikipedia link

 FOAF doesn't cover death dates... RDA has death dates, alternate names.
 Should FOAF be used where possible, adding in RDA to fill in? There are a
 lot of elements they have in common.

 kc

 Ed Summers wrote:

 On Tue, Aug 11, 2009 at 10:40 AM, Karen Coyleli...@kcoyle.net wrote:


 Ed, I have NO IDEA how you got to rdf/xml from the OL author link -- do
 tell, and I'll take a look! There is no RDF/XML export template for
 authors,
 but one could be created. The URI/URL is simply the address of the author
 page, and also considered the author identifier on OL.


 The nice thing about this linked data stuff is all you have to do is
 follow your nose:

 --

 e...@rorty:~$ curl --include --header Accept: application/rdf+xml
 http://openlibrary.org/a/OL1518080A
 HTTP/1.1 200 OK
 Content-Type: application/rdf+xml; charset=utf-8
 Date: Tue, 11 Aug 2009 15:01:32 GMT
 Server: lighttpd/1.4.19
 Transfer-Encoding: chunked
 Connection: Keep-Alive
 Age: 0

 ?xml version=1.0 encoding=utf-8?
 rdf:RDF
  xmlns:ol='http://openlibrary.org/type/author'

    ol:nameLawrence Lessig/ol:name

    ol:personal_nameLawrence Lessig/ol:personal_name

    ol:key/a/OL1518080A/ol:key

    ol:typehttp://openlibrary.org/type/author.rdf/ol:type

    ol:id5209974/ol:id

 /rdf:RDF

 --

 //Ed





 --
 ---
 Karen Coyle / Digital Library Consultant
 kco...@kcoyle.net http://www.kcoyle.net
 ph.: 510-540-7596   skype: kcoylenet
 fx.: 510-848-3913
 mo.: 510-435-8234

Re: [CODE4LIB] Long way to be a good coder in library

2009-07-22 Thread Ross Singer

On Wed, Jul 22, 2009 at 8:54 AM, Jon Gormanjonathan.gor...@gmail.com wrote:

 As far as
 languages, I'd probably lean towards ruby or python for starters or
 maybe Java.  Then move into php after you have a grasp of good
 programming practice.  You'll also figure out more what you like to
 work on.

Given the plaintive tone of the original post, I disagree with this
advice.  Development is almost solely based on confidence and
experience (with the latter affecting the former and vice-versa).
Good code is secondary.

I would almost certainly say start out with a procedural scripting
language (or at least a procedural approach) that is more common and
Googleable (PHP immediately comes to mind).  The nice thing about
something like PHP, in my mind, is that it's incredibly easy to see
immediate results without having any real idea of what's going on
(that being said, I have _no_ idea what Wayne's background might be --
perhaps this advice is too novice).  As many others have replied, it's
so much easier to learn by solving an actual problem (rather than
following the 'pet store' example in your tutorial) and, in my mind,
PHP is the easiest way get off the ground.  Successes breed confidence
to take on bigger projects, etc.

Once you've realized that this stuff isn't rocket science, /then/
break out the theory, find a different language (perhaps more suited
to the task at hand -- or not!) and think about good code.

Rob Styles sent this to my delicious account the other day (I'm not
sure what he was trying to tell me):
http://cowboyprogramming.com/2007/01/18/the-seven-stages-of-programming/

which I think sums up the arc pretty well.

-Ross.

[CODE4LIB] Fwd: [NGC4LIB] Integrating with your ILS through Web services and APIs

2009-07-22 Thread Ross Singer

This seems a _far_ more appropriate list for these questions.

-Ross.
-- Forwarded message --
From: Breeding, Marshall marshall.breed...@vanderbilt.edu
Date: Wed, Jul 22, 2009 at 9:53 PM
Subject: [NGC4LIB] Integrating with your ILS through Web services and APIs
To: ngc4...@listserv.nd.edu

I am in the process of writing an issue of Library Technology Reports
for ALA TechSource titled Hype or reality:  Opening up library
systems through Web Services and SOA.  Today almost all ILS products
make claims regarding offering more openness through APIs, Web
services, and through a service-oriented architecture (SOA).  This
report aims to look beyond the marketing claims and identify specific
types of tasks that can be accomplished beyond the delivered
interfaces through programmatic access to the system internals.

As part of the research for this article I am soliciting feedback from
libraries that taken advantage of Web Services or other API's in
conjunction with their core Integrated Library System (ILS) to meet
specific needs.  I'm interested in hearing about how you might have
been able to integrate library content and services into applications,
extracted data, automated processes or other novel applications.
Please tell me about your experiences with your ILS in regard to the
APIs it offers:

 - Do you feel like you can pretty much do anything you want with the
system, or do you feel constrained?
 -Are the APIs offered able to address all the data and functionality
within the ILS?
 -On the flip side, do you feel like your ILS is too closed?
 -Do you find the APIs offered by the developer of the ILS to be well
documented?
 - What programming languages or other tools were you able to use to
take advantage of these APIs?
 - What level of programming proficiency is required:  Systems
librarian with scripting languages, software development engineer, or
something in between?
 - What's on your wish list?  What kind of APIs would you like to see
incorporated into your current or next ILS?
 - I'm interested in responses from those that use open source ILS
products as well.  Are you able to programmatically interact with the
ILS?
 - Do you consider your ILS as embracing a true Service-oriented
architecture?  Systems vendors increasingly promote their ILS as SOA.
Can you provide examples where the ILS does or does not exhibit traits
of SOA in your environment.

While it's important for the ILS to offer support for standard
protocols such as Z39.50, NCIP, and OAI, that's not the core of the
issue here.  What I'm looking for are API's that allow the library to
get at data and functionality not addressed by these protocols.

Thanks in advance for sharing your experiences in ILS API's with me
for this report.

I appreciate your assistance.

-marshall

Summary excerpt:
Libraries increasingly need to extract data, connect with external
systems, and implement functionality not included with the delivered
systems.  Rather than being reliant on the products developers for
enhancements to meet these needs, libraries increasingly demand the
ability to exploit their systems using APIs, Web Services, or other
technologies.  Especially in libraries that exist in complex
environments where many different systems need to interact, the demand
for openness abounds.  As libraries develop their IT infrastructure,
it's imperative to understand the extent to which their automation
products are able to interoperate and thrive in this growing realm of
Web services.  This report aims to assess the current slate of major
library automation systems in regard to providing openness through
API's, Web Services, and the adoption of SOA.

Marshall Breeding
Director for Innovative Technology and Research
Vanderbilt University Library
Editor, Library Technology Guides
http://www.librarytechnology.org
615-343-6094

Re: [CODE4LIB] rdf files as linked data

2009-07-20 Thread Ross Singer

Oops, scratch my warning at the end of point 5.  It shouldn't affect
the point 1 strategy at all.  Like I said, httpRange-14 is confusing
:)

-Ross.

On Mon, Jul 20, 2009 at 10:58 PM, Ross Singerrossfsin...@gmail.com wrote:
 I'll pile on with a with a couple of other things:

 1. I second Ed's point about conneg:
 http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221
 should probably return a 300 code with pointers to your various file
 types.
 2. Replace dc with dcterms (http://purl.org/dc/terms/)
 3. While Ed's point about linking to other resources would be nice,
 first I'd focus on the resources you have and can control.  Rather
 than a literal for dc:creator, can you mint URIs for all of your
 authors?  How about subjects?
 4.  Your URIs in your rdf:descripti...@rdf:about] aren't terribly
 helpful on their own.  Either give the full URI here or add an
 xml:base=http://infomotions.com/etexts/literature/english/1500-1599/;
 attribute to the tag -- that should improve things.
 5.  I think your dc:contributor tag might be running aground of
 httpRange-14 -- I'm pretty sure you didn't help Thomas More write his
 story.  This, I think, is the absolute hardest thing to get right with
 RDF/LOD.  A nice example of sidestepping this sort of collision is
 Toby Inkster's RDF-ification of Amazon Web Services:
 http://purl.org/NET/book/isbn/0140449108#book -- in this example, the
 'record metadata' lives at the base URI
 (http://purl.org/NET/book/isbn/0140449108) and the real world object
 lives at http://purl.org/NET/book/isbn/0140449108#book.  This way Toby
 can claim responsibility for making the data the available, but not
 assert that he had any part in creating the work itself.  The two
 resources are linked to each other, but are each unique, independent
 URIs.  If you do do this, though, it messes up what I said in point
 #1.

 The concordances would also be really neat to see -- building off of
 WordNet would be pretty cool with all of these old texts.

 Good luck, it's great to see.
 -Ross.

 On Mon, Jul 20, 2009 at 10:04 PM, Ed Summerse...@pobox.com wrote:
 Heya Eric:

 The main thing you'd want to do would be to make sure URIs like:

  http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221

 returned something useful for both people and machine agents. The
 nitty gritty details of how to do this can roughly be found in the
 Cool URIs for the Semantic Web [1], or How to Publish Linked Data [2].
 A slight variation would be to use something like RDFa [3] to embed
 metadata in your HTML docs, or GRDDL [4] to provide a stylesheet to
 transform some HTML to RDF.

 The end goal of linked data, is to provide contextual links from your
 stuff to other resources on the web, aka timbl's rule #4:

  Include links to other URIs. so that they can discover more things. [3]

 So for example you might want to assert that:

  http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221
 owl:sameAs http://dbpedia.org/page/Utopia_(book) .

 or:

  http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221
 dcterms:creator http://dbpedia.org/resource/Thomas_More .

 It's when you link out to other resources on the web that things get
 interesting, more useful, and potentially more messy :-) For example
 instead of owl:sameAs perhaps an assertion using FRBR or RDA would be
 more appropriate.

 Thanks for asking the question. The public-lod list [4] at the w3c is
 also a really friendly/helpful group of people making data sets
 available as linked-data.

 //Ed

 [1] http://www.w3.org/TR/cooluris/
 [2] http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
 [3] http://www.w3.org/TR/xhtml-rdfa-primer/
 [4] http://www.w3.org/TR/grddl-primer/
 [5] http://www.w3.org/DesignIssues/LinkedData.html
 [6] http://lists.w3.org/Archives/Public/public-lod/

Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-16 Thread Ross Singer

On Wed, Jul 15, 2009 at 8:57 AM, Ray Denenberg, Library of
Congressr...@loc.gov wrote:

 Ross, if you're talking about the ISO 20775 xml schema:
 http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd

 It's free.

It's also not a spec, it's a schema.  If the expectation is that
people are actually going to adopt a standard from merely looking at
an .xsd, my prediction is that this will go nowhere.

I mean, I'm wrong a lot, but I feel pretty good about this reading
from my crystal ball.

-Ross.

Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-14 Thread Ross Singer

Well, it's not a great example, because I don't have a
'counter-example', but I think it will remain to be seen if ISO 20775
goes anywhere if it, too, remains behind a pay wall. If an open spec
were to come along that allowed the transfer of holdings and
availability information that was decent and simple it would basically
render ISO 20775 irrelevant (if the pay wall doesn't already).

RDA, I think, might also suffer from this problem.

-Ross.

On Tue, Jul 14, 2009 at 10:35 AM, Walter Lewislew...@hhpl.on.ca wrote:
William Wueppelmann wrote:

[snip]
I'm not entirely sure that TCP/IP and the other IETF RFCs became
established because of restrictions placed on OSI. I was under the
impression that OSI was also insanely complicated and that the IETF
standards were much cheaper to implement from a technical standpoint. And,
from a product standpoint, in the mid-90s, there were still a lot of bets
being placed on closed online services like AOL, MSN, and Compuserve.

Not to mention the book I once saw on MS Blackbird ... (MSN .0001?) which,
thankfully, was abandonned before leaving the nest.

Any examples closer to the library world?

What I had been hoping for were data standards more in the library space.
I've read ANSI's Z.39.19 which deals with Monolingual thesauri.
(a copy lives here: http://www.slis.kent.edu/~mzeng/Z3919/8Z3919toc.htm)
Near as I can tell the parallel multi-lingual standard is ISO 5964 and is
available at

http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?ics1=01ics2=140ics3=20csnumber=12159
for a fee of 168 Swiss francs (CHF) or ~$155USD

I pay attention to the one, and never expect to read the other.

This past week I was on the edge of another discussion of standards with
associated controlled vocabularies (in the K-12 domain) where a criticism
was raised that it wasn't Creative Commons with an Attribution requirement,
else how could you teach it?

That got me thinking about whether we shouldn't have already learned that
lesson because the 'net largely runs on public RFCs, but wondered if I
wasn't missing other examples inside our domain.

Walter

Re: [CODE4LIB] OpenURL question

2009-06-30 Thread Ross Singer

Stuart,

The short is answer is probably.

The longer answer is that, yes, OpenURL is currently the best way to
accomplish what you're looking for.  That being said, I think your
audience may make this a little more complicated and the solutions
perhaps more fragile and hacky.

Since you don't have an /institutional/ target audience (at least,
that's the impression I get), this falls out of the sort of
traditional OpenURL workflow.  Generally, you have some a priori
knowledge of the affiliation of the service user and can point your
OpenURLs at that person's institutional link resolver (so they can get
context services that are appropriate for them).  When you don't have
that, the general answer is to use COinS [1].  However, COinS
themselves have no way to associate a person at a web browser with an
institutional link resolver.

Dan Chudnov wrote a couple years ago about using the OCLC Link
Resolver Registry to handle this [2]:  grab the user's IP address, do
a lookup in the background, and rewrite the COinS to use the person's
institutional link resolver, if there is a match.

The problem is, because it's based on IP, it's quite possible there
will not be a match.

There's also the reality that lots and lots (perhaps the majority) of
people have no access to a link resolver at all - do they just get
left in the dark?  It's also possible that some (much?) of what you'd
be citing would be available in OA archives or Google Book Search or
Open Library.  This would make the case to also run a 'default' link
resolver, such as the Umlaut [3], to find open web things.
Conveniently, the Umlaut is engineered(*) to be able to handle the
OCLC Resolver Registry and merge an external resolver (or resolvers)
into the options.

I fear this doesn't sound terribly encouraging, but good luck,
-Ross.
[1] http://ocoins.info/
[2] http://onebiglibrary.net/story/solving-the-appropriate-resolver-problem
[3] http://wiki.code4lib.org/index.php/Umlaut

* - Umlaut originally had this functionality, but due to a lack of
perceivable need, it's gone a bit to seed.  That being said, it's
designed to do this and shouldn't be too difficult to reintegrate.
On Mon, Jun 29, 2009 at 10:06 PM, stuart yeatesstuart.yea...@vuw.ac.nz wrote:
 We have an index of place names that we're considering digitising and
 ingesting into our collection (http://www.nzetc.org/). For each place name a
 series of bibliographic references (often including page #) list uses of the
 place name. We want to build a mapping from those bibliographic references
 to the documents:
 * some of the doucments are in our electronic collection
 * some of the documents are in other peoples' electronic collections
 * some of the documents are not online yet, but may soon be

 Is OpenURL the right tool for this job? Is there an implementation /
 configuration that people suggest for this?

 cheers
 stuart
 --
 Stuart Yeates
 http://www.nzetc.org/       New Zealand Electronic Text Centre
 http://researcharchive.vuw.ac.nz/     Institutional Repository

Re: [CODE4LIB] HTML mark-up in MARC records

2009-06-23 Thread Ross Singer

On Tue, Jun 23, 2009 at 9:39 AM, Casey Bisson cbis...@plymouth.edu wrote:

 The mistake here is presuming that (X)HTML coded data isn't (or can't be)
 data.


I think it's a greater mistake to assume that it will be.

We are talking about putting semantics into a structured data source (by
people who are trained in both the data format in particular and the tools
to input data into it), why introduce another data format (out of context, I
might add -- we're only talking about a single tag) which is much more
likely to be adulterated by human input and the fallacies that come with
being human?

A semantic XHTML serialization of data is very different than ad-hoc usage
of HTML for a local need.

-Ross.

Re: [CODE4LIB] FW: [CODE4LIB] openurl.info ?

2009-05-17 Thread Ross Singer

This was the historical (and annoying) behavior.  At one point a
redirect was added to http://openurl.info/registry that I suppose
needs to be recreated.

-Ross.

On Sun, May 17, 2009 at 3:50 AM, Boheemen, Peter van
peter.vanbohee...@wur.nl wrote:
 Hmm Roy but should it be pointing to the PURL site ?

 http://www.openurl.info/ = http://purl.oclc.org/

 Peter

 Drs. P.J.C. van Boheemen
 Hoofd Applicatieontwikkeling en beheer - Bibliotheek Wageningen UR
 Head of Application Development and Management - Wageningen University and 
 Research Library
 tel. +31 317 48 25 17                                                         
                            http://library.wur.nl
 P Please consider the environment before printing this e-mail



 -Oorspronkelijk bericht-
 Van: Code for Libraries namens Roy Tennant
 Verzonden: zo 17-5-2009 2:03
 Aan: CODE4LIB@LISTSERV.ND.EDU
 Onderwerp: [CODE4LIB] FW: [CODE4LIB] openurl.info ?

 -- Forwarded Message
 From: Karen Wetzel kwet...@niso.org
 Reply-To: A discussion listserv for topics surrounding the Open URL NISO
 standard Z39.88. open...@oclc.org
 Date: Fri, 15 May 2009 12:11:01 -0400
 To: open...@oclc.org
 Subject: Re: [CODE4LIB] openurl.info ?

 Greetings,

 I just wanted to send a quick follow-up on my last note to confirm that
 we've worked to fix this error and that the www.openurl.info
 http://www.openurl.info  domain is now working again. If you are still
 experiencing problems with the URL, please do send me a note and I'll be
 sure to look into it.

 Again, I apologize on behalf of NISO for this error, and appreciate all your
 feedback and patience as we worked to resolve this problem.

 Truly,
 Karen

 --
 Karen A. Wetzel
 Standards Program Manager
 National Information Standards Organization (NISO)
 One North Charles Street, Suite 1905
 Baltimore, MD 21201
 Tel.: 301-654-2512
 Fax: 410-685-5278
 E-mail: kwet...@niso.org





 On May 15, 2009, at 11:13 AM, Ray Denenberg, Library of Congress wrote:

 Yes apparently NISO is (was) the owner. I've sent them a note (and anyone 
 else
 who feels so inclined should too: Email:nis...@niso.org
 mailto:Email:nis...@niso.org  ).
 --Ray
 - Original Message -
 From: Venicio mailto:vrbu...@gmail.com
 To: open...@oclc.org
 Sent: Friday, May 15, 2009 11:05 AM
 Subject: Re: [CODE4LIB] openurl.info ?

 FYI:

 Domain ID:D2132192-LRMS
 Domain Name:OPENURL.INFO http://OPENURL.INFO
 Created On:10-May-2002 17:49:32 UTC
 Last Updated On:15-May-2009 14:56:44 UTC
 Expiration Date:10-May-2010 17:49:32 UTC
 Sponsoring Registrar:DSTR Acquisition PA I, LLC d/b/a Domainbank.com
 (R107-LRMS)
 Status:OK
 Registrant ID:C4373421-LRMS
 Registrant Name:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION
 Registrant Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION
 Registrant Street1:4733 Bethesda Ave.
 Registrant Street2:STE 300
 Registrant Street3:
 Registrant City:Bethesda
 Registrant State/Province:MD
 Registrant Postal Code:20814
 Registrant Country:US
 Registrant Phone:+1.3016542512
 Registrant Phone Ext.:
 Registrant FAX:
 Registrant FAX Ext.:
 Registrant Email:nis...@niso.org mailto:email%3anis...@niso.org
 Admin ID:DOT-3Q02W1748WCF
 Admin Name:Pat Stevens
 Admin Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION
 Admin Street1:4733 Bethesda Ave.
 Admin Street2:
 Admin Street3:
 Admin City:Bethesda
 Admin State/Province:MD
 Admin Postal Code:20814
 Admin Country:BE
 Admin Phone:+32.3016542512
 Admin Phone Ext.:
 Admin FAX:
 Admin FAX Ext.:
 Admin Email:nis...@niso.org mailto:email%3anis...@niso.org
 Billing ID:DOT-132FHTD2SCKP
 Billing Name:Patricia Stevens
 Billing Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION
 Billing Street1:4733 Bethesda Ave.
 Billing Street2:
 Billing Street3:
 Billing City:Bethesda
 Billing State/Province:MD
 Billing Postal Code:20814
 Billing Country:BE
 Billing Phone:+32.3016542512
 Billing Phone Ext.:
 Billing FAX:
 Billing FAX Ext.:
 Billing Email:nis...@niso.org mailto:email%3anis...@niso.org
 Tech ID:DOT-IQIOP5LKRKM0
 Tech Name:Pat Stevens
 Tech Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION
 Tech Street1:4733 Bethesda Ave.
 Tech Street2:
 Tech Street3:
 Tech City:Bethesda
 Tech State/Province:MD
 Tech Postal Code:20814
 Tech Country:BE
 Tech Phone:+32.3016542512
 Tech Phone Ext.:
 Tech FAX:
 Tech FAX Ext.:
 Tech Email:nis...@niso.org mailto:email%3anis...@niso.org
 Name Server:DNS.OCLC.ORG http://DNS.OCLC.ORG
 Name Server:DNS2.OCLC.ORG http://DNS2.OCLC.ORG
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:
 Name Server:

 On Fri, May 15, 2009 at 10:39 AM, Phil Adams p...@dmu.ac.uk wrote:
 I heard via twitter that:

 openurl.info http://openurl.info  domain name expired on sunday! 
 somebody
 messed up

 Regards,
 Philip Adams
 Senior Assistant Librarian (Electronic Services Development)
 De Montfort University Library
 0116

[CODE4LIB] Fwd: OPEN POSITION: Linked Data in Digital Libraries

2009-05-16 Thread Ross Singer

Seems like Linked Data + Library + Vienna might be of interest to somebody here.

-Ross.

-- Forwarded message --
From: Bernhard Haslhofer bernhard.haslho...@univie.ac.at
Date: Fri, May 15, 2009 at 3:46 AM
Subject: OPEN POSITION: Linked Data in Digital Libraries
To: Linked Data community public-...@w3.org

Hello,

if somebody feels like moving to Vienna to work in a digital library
project where we will definitely do some Linked Data research, please
let me know.

Best,
Bernhard

-

The Multimedia Information Systems Group
(http://www.cs.univie.ac.at/mis) at the University of Vienna / Austria
is looking for an excellent candidate to work as a PhD researcher in
the EU eContentPlus project EuropeanaConnect.

The objective of EuropeanaConnect (http://www.europeanaconnect.eu/) is
to deliver core components which are essential for the realization of
the European Digital Library (Europeana) as a truly interoperable,
multilingual and user-oriented service for all European citizens. The
project will provide the  technologies and resources to semantically
enrich vast amounts of digital content in Europeana. This will enable
semantically based content discovery including support for advanced
searching and browsing, allowing for delivery of enhanced services and
making Europeana content more accessible, reusable and exploitable.

We expect the applicant to work in the following areas:
- Web-based knowledge organization systems (e.g., Linked Data)
- metadata registries
- persistent digital object identifiers
- multimedia annotations

The ideal candidate holds a MS degree in Computer Science or related
field and is able to consider both theoretical and
practical/implementation aspects in her/his work. Fluent english
communication and programming skills are fundamental requirements.
Preferably the candidate has a background in one of the following
fields:
- semantic technologies (RDF, SKOS, etc)
- metadata management
- multimedia computing

The position starts as soon as possible and is full-time (40h/week)
for the duration of the project until Oct 2011.

Review of applications will begin immediately and will continue until
the position is filled. The successful candidate will tightly work
with international partners and has the possibility to pursue PhD work
within the scope of the project.

We invite interested applicants to send their resume including a
pointer to their previous / current work and publications to
sekretar...@mminf.univie.ac.at, Reference No: 396/MIS/0109.

The University of Vienna is an Equal Opportunities Employer. Women
therefore are especially encouraged to apply. In case of equal
qualification, women applying are to be given priority unless reasons
specific to an individual male candidate tilt the balance in his favor
 according to judgments of the EU Court of Justice.

-

__
Research Group Multimedia Information Systems
Department of Distributed and Multimedia Systems
Faculty of Computer Science
University of Vienna

Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
E-Mail: bernhard.haslho...@univie.ac.at
WWW: http://www.cs.univie.ac.at/bernhard.haslhofer

[CODE4LIB] Diebold-o-tron-o-matic IG

2009-05-14 Thread Ross Singer

Hi everybody.

We're probably 6 months (or less) from the voting season in Code4libya
and I want to preemptively counter the catcalls, jeers, the calls for
the Drupal voting module, etc. prior to 4 days before the first vote
opening.

So, if you're interested in participating in this, let me know.  If
you're interested in /leading/ this, /please/ let me know, because I'm
perfectly happy just firing up the Diebold-o-tron-o-matic for another
year, so if you've got a real bone to pick with how things work, stand
and deliver.

Thanks,
-Ross.

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-12 Thread Ross Singer

On Tue, May 12, 2009 at 6:21 AM, Jakob Voss jakob.v...@gbv.de wrote:
 Ross Singer wrote:

 ?xml version=1.0 encoding=UTF-8?
 formats xmlns=http://unapi.info/;
  format name=foaf uri=http://xmlns.com/foaf/0.1//
 /formats

 I generally agree with this, but what about formats that aren't XML or
 RDF based?  How do I also say that you can grab my text/x-vcard?  Or
 my application/marc record?  There is still lots of data I want that
 doesn't necessarily have these characteristics.

 In my blog posting I included a way to specify mime types (such as as
 text/x-vcard or application/marcURI) as URI. According to RFC 2220 the
 application/marc type refers to the harmonized USMARC/CANMARC
 specification whatever this is - so the mime type can be used as format
 identifier. For vCard there is an RDF namespace and a (not very nice) XML
 namespace:

 http://www.w3.org/2001/vcard-rdf/3.0#
 vcard-temp (see http://xmpp.org/registrar/namespaces.html)


This is vCard as RDF, not vCard the format (which is text based).  It
would be the equivalent of saying, here's an hCard, it's the same
thing, right? although the reason I may be requesting a vCard in its
native format is because I have a vCard parser or an application that
consumes them (Exchange, for example).


 That depends whether you want to be taken serious outside the library
 community and target at the web as a whole or not.


My point is that there's a step before that, possibly, where the
theory behind unAPI, Jangle, whatever, is tested to even see if it's
going in the right direction before writing it up formally as an RFC.

I don't think the lack of adoption of unAPI has anything to do with
the prose of it's specification document.  The RFC format is useful
for later adopters, but people that, say, jumped on the Atom
syndication format as a good idea didn't need an RFC first, they
developed a spec, /then/ wrote the standard once they  had an idea of
how it needed to work.

-Ross.

Re: [CODE4LIB] Formats and its identifiers

2009-05-11 Thread Ross Singer

On Mon, May 11, 2009 at 9:53 AM, Jakob Voss jakob.v...@gbv.de wrote:

 That's your interpretation. According to the schema, the MODS format *is*
 either a single mods-element or a modsCollection-element. That's exactely
 what you can refer to with the namespace identifier
 http://www.loc.gov/mods/v3.

Agreed.  The same is true, of course, of MARC and, by extension,
MARCXML.  Part of the format is that it can be one record or
multiple.  I don't think this a particularly strong argument against
using the namespace as an identifier.

 The namespace http://www.loc.gov/mods/v3 of the top level element 'mods'
 does not identify the top level element but the MODS *format* (in any of the
 versions 3.0-3.4) itself. This format *includes* the top level element
 'mods'.

I'm not really sure of the changes between MODS v.3.0-3.3 -- are they
basically backwards and forwards compatible?

I imagine there are a lot of cases where the client doesn't care what
point release of MODS the thing is serialized as, just that it's MODS
and that it can find generally what it's looking for in that
structure, right?

-Ross.

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer

Ideally, though, if we have some buy in and extend this outside our
communities, future identifiers *should* have fewer variations, since
people can find the appropriate URI for the format and use that.

I readily admit that this is wishful thinking, but so be it.  I do
think that modeling it as SKOS/RDF at least would make it attractive
to the Linked Data/Semweb crowd who are likely the sorts of people
that would be interested in seeing URIs, anyway.

I mean, the worst that can happen is that nobody cares, right?

-Ross.

On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:
 I am pleased to disagree to various levels of 'strongly (if we can agree on 
 a definition for it :-).

 Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he 
 supplied

 -snip
 We could have something like:
 http://purl.org/DataFormat/marcxml
  . skos:prefLabel MARC21 XML .
  . skos:notation info:srw/schema/1/marcxml-v1.1 .
  . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
  . skos:notation http://www.loc.gov/MARC21/slim; .
  . skos:broader http://purl.org/DataFormat/marc .
  . skos:description ... .

 Or maybe those skos:notations should be owl:sameAs -- anyway, that's not 
 really the point.  The point is that all of these various identifiers would 
 be valid, but we'd have a real way of knowing what they actually mean.  Maybe 
 this is what you mean by a crosswalk.
 --end

 Is exactly what I meant by a crosswalk. Basically a translating dictionary 
 which allows any entity (system or person) to relate the various identifiers.

 I would love to see a single unified set of identifiers, my life as a 
 wrangled of record semantics would be s much easier. But I don't see it 
 happening.

 That does not mean we should not try. Even a unification in our space (and 
 if not in the library/information space, then where? as Mike said) reduces 
 the larger problem. However I don't believe it is a scalable solution (which 
 may not matter if all of a group of users agree, they why not leave them to 
 it) as, at any time one group/organisation/person/system could introduce a 
 new scheme, and a world view which relies on unified semantics would no 
 longer be viable.

 Which means until global unification on an object (better a (large) set of 
 objects) is achieved it will be necessary to have the translating dictionary 
 and systems which know how to use it. Unification reduces Ray's list of 15 
 alternative uris to 14 or 13 or whatever. As long as that number is 1 
 translation will be necessary. (I will leave aside discussions of massive 
 record bloat, continual system re-writes, the politics of whose view 
 prevails, the unhelpfulness of compromises for joint solutions, and so on.)

 Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Friday, May 01, 2009 02:36
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All

 Jonathan Rochkind writes:
   Crosswalk is exactly the wrong answer for this. Two very small
   overlapping communities of most library developers can surely agree
   on using the same identifiers, and then we make things easier for
   US.  We don't need to solve the entire universe of problems. Solve
   the simple problem in front of you in the simplest way that could
   possibly work and still leave room for future expansion and
   improvement. From that, we learn how to solve the big problems,
   when we're ready. Overreach and try to solve the huge problem
   including every possible use case, many of which don't apply to you
   but SOMEDAY MIGHT... and you end up with the kind of
   over-abstracted over-engineered
   too-complicated-to-actually-catch-on solutions that... we in the
   library community normally end up with.

 I strongly, STRONGLY agree with this.  It's exactly what I was about
 to write myself, in response to Peter's message, until I saw that
 Jonathan had saved me the trouble :-)  Let's solve the problem that's
 in front of us right now: bring SRU into harmony with OpenURL in this
 respect, and the very act of doing so will lend extra legitimacy to
 the agreed-on identifiers, which will then be more strongly positioned
 as The Right Identifiers for other initiatives to use.

  _/|_  ___
 /o ) \/  Mike Taylor    m...@indexdata.com
 http://www.miketaylor.org.uk
 )_v__/\  You cannot really appreciate Dilbert unless you've read it in
        the original Klingon. -- Klingon Programming Mantra

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer

I agree that most software probably won't do it.  But the data will be
there and free and relatively easy to integrate if one wanted to.

In a lot ways, Jonathan, it's got Umlaut written all over it.

Now to get to Jonathan's point -- yes, I think the primary goal still
needs to be working towards bringing use of identifiers for a given
thing to a single variant.  However, we would obviously have to know
what the options are in order to figure out what that one is -- while
we're doing that, why not enter the different options into the
registry and document them in some way (such as, who uses this
variant?).  Voila, we have a crosswalk.

Of course, the downside is that we technically also have a new URI
for this resource (since the skos:Concept would need to have a URI),
but we could probably hand wave that away as the id for the registry
concept, not the data format.

So -- we seem to have some agreement here?

-Ross.

On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 From my perspective, all we're talking about is using the same URI to refer
 to the same format(s) accross the library community standards this community
 generally can control.

 That will make things much easier for developers, especially but not only
 when building software that interacts with more than one of these standards
 (as client or server).

 Now, once you've done that, you've ALSO set the stage for that kind of RDF
 scenario, among other RDF scenarios. I agree with Mike that that particular
 scenario is unlikely, but once you set the stage for RDF experimentation
 like that, if folks are interested in experimenting (and many in our
 community are), maybe something more attractively useful will come out of
 it.

 Or maybe not. Either way, you've made things easier and more inter-operable
 just by using the same set of URIs across multiple standards to refer to the
 same thing. So, yeah, I'd still focus on that, rather than any kind of
 'cross walk', RDF or not. It's the actual use case in front of us, in which
 the benefit will definitely be worth the effort (if the effort is kept
 manageable by avoiding trying to solve the entire universe of problems at
 once).

 Jonathan

 Mike Taylor wrote:

 So what are we talking about here?  A situation where an SRU server
 receives a request for response records to be delivered in a
 particular format, it doesn't recognise the format URI, so it goes and
 looks it up in an RDF database and discovers that it's equivalent to a
 URI that it does know?  Hmm ... it's crazy, but it might just work.

 I bet no-one does it, though.

  _/|_
  ___
 /o ) \/  Mike Taylor    m...@indexdata.com
  http://www.miketaylor.org.uk
 )_v__/\  Someday, I'll show you around monster-free Tokyo -- dialogue
         from Gamera: Guardian of the Universe




 Peter Noerr writes:
   I agree with Ross wholeheartedly. Particularly in the use of an RDF
 based mechanism to describe, and then have systems act on, the semantics of
 these uniquely identified objects. Semantics (as in Web) has been exercising
 my thoughts recently and the problems we have here are writ large over all
 the SW people are trying to achieve. Perhaps we can help...
     Peter      -Original Message-
    From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
 Of
    Ross Singer
    Sent: Friday, May 01, 2009 13:40
    To: CODE4LIB@LISTSERV.ND.EDU
    Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
 Rule
    Them All
       Ideally, though, if we have some buy in and extend this outside
 our
    communities, future identifiers *should* have fewer variations, since
    people can find the appropriate URI for the format and use that.
       I readily admit that this is wishful thinking, but so be it.  I
 do
    think that modeling it as SKOS/RDF at least would make it attractive
    to the Linked Data/Semweb crowd who are likely the sorts of people
    that would be interested in seeing URIs, anyway.
       I mean, the worst that can happen is that nobody cares, right?
       -Ross.
       On Fri, May 1, 2009 at 3:41 PM, Peter Noerr
 pno...@museglobal.com wrote:
     I am pleased to disagree to various levels of 'strongly (if we can
 agree
    on a definition for it :-).
    
     Ross earlier gave a sample of a crossw3alk' for my MARC problem.
 What he
    supplied
    
     -snip
     We could have something like:
     http://purl.org/DataFormat/marcxml
      . skos:prefLabel MARC21 XML .
      . skos:notation info:srw/schema/1/marcxml-v1.1 .
      . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
      . skos:notation http://www.loc.gov/MARC21/slim; .
      . skos:broader http://purl.org/DataFormat/marc .
      . skos:description ... .
    
     Or maybe those skos:notations should be owl:sameAs -- anyway,
 that's not
    really the point.  The point is that all of these various identifiers
 would
    be valid, but we'd

Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ross Singer

So hey, I'm nobody wanted to see this thread revived, but I'm hoping
you info uri folks can clear something up for me.

So I'm trying to gather together a vocabulary of identifiers to
unambiguously describe the format of the data you would be getting in
a Jangle feed or an UnAPI response (or any other variation on this
theme). I have a MODS document and I want *you* to have it too!.

Jakob Voss made the (reasonable) suggestion that rather than create
yet another identifier or registry to describe these formats, instead
it would make sense to use the work that the SRU:

http://www.loc.gov/standards/sru/resources/schemas.html

or OpenURL:

http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

communities have already done. Which makes a lot of sense. It would
be nice to use the same identifier in Jangle, SRU and OpenURL to say
that this is a MARCXML or ONIX record.

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this? How do we keep up? Are they
reusable? Which one should be used? Doesn't this pretty horribly
undermine the purpose of using info URIs in the first place?

Is anybody else interested in working on a way to unambiguously say
here is a Dublin Core resource as XML, but it is not OAI DC or this
is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse
among all of our various ways of sharing data?

Thanks,
-Ross.

< 1 2 3 4 5 >

201 - 300 of 402 matches

Mail list logo