Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote: I’m sure this is way too much info for most (or all) on this list, but in case it is helpful, I thought I’d throw it out there. I disagree. I think this was fantastic and most enlightening. Most of us deal with this stuff all the time, yet we (obviously) have zero idea how it actually works, so it's nice to be schooled (and have this mini-lesson in LCSH contextually in the mailing list archives). Thanks for putting this out there, Kelley. -Ross.
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote: Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? I would definitely ask this question somewhere other than Code4lib (autocat, maybe?), since I think the answer is more complicated than this (although they could validate/invalidate your assumption about whether or not this approach would get you close enough). My understanding is that Education--England--Finance *is* authorized, because Education--Finance is and England is a free-floating geographic subdivision. Because it's also an authorized heading, Education--England--Finance is, in fact, an authority. The problem is that free-floating subdivisions cause an almost infinite number of permutations, so there aren't LCCNs issued for them. This is where things get super-wonky. It's also the reason I initially created lcsubjects.org, specifically to give these (and, ideally, locally controlled subject headings) a publishing platform/centralized repository, but it quickly grew to be more than just a side project. There were issues of how the data would be constructed (esp. since, at the time, I had no access to the NAF), how to reconcile changes, provenance, etc. Add to the fact that 2 years ago, there wasn't much linked library data going on, it was really hard to justify the effort. But, yeah, it would be worth running your ideas by a few catalogers to see what they think. -Ross.
Re: [CODE4LIB] LCSH and Linked Data
On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: 1. I believe id.loc.gov includes a list of MARC countries and a list for geographic areas (based on the geographic names in 151 fields. 2. cataloging rules instruct catalogers to use THOSE very name forms in 151 $a when a subject can be divided (limited) geographically using $z. Yeah, this could get ugly pretty fast. It's a bit unclear to me what the distinction is between identical terms in both the geographic areas and the country codes (http://id.loc.gov/vocabulary/geographicAreas/e-uk-en http://id.loc.gov/vocabulary/countries/enk). Well, in LC's current representation, there *is* no distinction, they're both just skos:Concepts that (by virtue of skos:exactMatch) effectively interchangeable. See also http://id.loc.gov/vocabulary/geographicAreas/fa and http://id.loc.gov/authorities/sh85009230#concept. You have a single institution minting multiple URIs for what is effectively the same thing (albeit in different vocabularies), although, ironically, nothing points at any actual real world objects. VIAF doesn't do much better in this particular case (there are lots of examples where it does, mind you): http://viaf.org/viaf/142995804 (see: http://viaf.org/viaf/142995804/rdf.xml). We have all of these triangulations around the concept of England or Atlas mountains, but we can't actually refer to England or the Atlas mountains. Also, I am not somehow above this problem, either. With the linked MARC codes lists (http://purl.org/NET/marccodes/), I had to make a similar decision, I just chose to go the opposite route: define them as things, rather than concepts (http://purl.org/NET/marccodes/gacs/fa#location, http://purl.org/NET/marccodes/gacs/e-uk-en#location, http://purl.org/NET/marccodes/countries/enk#location, etc.), which presents its own set of problems (http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing no matter how liberal your definition). At some point, it's worth addressing what these things actually *are* and if, indeed, they are effectively the same thing, if it's worth preserving these redundancies, because I think they'll cause grief in the future. -Ross.
Re: [CODE4LIB] Documentation request for the marc gem
Thanks, Ed. That would have been a useful tidbit for me to have added :) Also, if there's interest, we can set up the Github Wiki for ruby-marc. There is some functionality that would be difficult to explain (including the pros and cons) about in the rdocs, such as the XML parsers (and to write new ones) and there are some caveats on when to use field maps in MARC::Record and when find/find_all works better. Anyway, this seems like it might be useful, and if others think so, too, well, let me know! Thanks! -Ross. On Wed, Mar 16, 2011 at 6:02 AM, Ed Summers e...@pobox.com wrote: Hi Tony, Just in case it wasn't obvious, the source code is on GitHub [1]. As Ross said, please consider forking it and sending a pull request for any documentation improvements you want to do. //Ed [1] https://github.com/ruby-marc/ruby-marc On Tue, Mar 15, 2011 at 3:18 PM, Ross Singer rossfsin...@gmail.com wrote: Hi Tony, I'm glad that ruby-marc appears to be generally useful. Another (even simpler) way to do what you want is: record.to_marc Which, I think, would do the same thing you're doing with MARC::Writer.encode. If you want to write up a block of text to plop into the README, feel free to send some me some copy (wholesale edits also welcome). Thanks, -Ross. On Tue, Mar 15, 2011 at 2:40 PM, Tony Zanella tony.zane...@gmail.com wrote: Hello all, If I may suggest adding to the documentation for the marc gem (http://marc.rubyforge.org/)... Currently, the documentation gives examples for how to read, create and write MARC records. The source code also includes an encode method in MARC::Writer, which came in handy for me when I needed to send an encoded record off to be archived on the fly, without writing it to the filesystem. That method isn't in the documentation, but it would be nice to see there! It could be as simple as: # encoding a record MARC::Writer.encode(record) Thanks for your consideration! Tony
Re: [CODE4LIB] Documentation request for the marc gem
Hi Tony, I'm glad that ruby-marc appears to be generally useful. Another (even simpler) way to do what you want is: record.to_marc Which, I think, would do the same thing you're doing with MARC::Writer.encode. If you want to write up a block of text to plop into the README, feel free to send some me some copy (wholesale edits also welcome). Thanks, -Ross. On Tue, Mar 15, 2011 at 2:40 PM, Tony Zanella tony.zane...@gmail.com wrote: Hello all, If I may suggest adding to the documentation for the marc gem (http://marc.rubyforge.org/)... Currently, the documentation gives examples for how to read, create and write MARC records. The source code also includes an encode method in MARC::Writer, which came in handy for me when I needed to send an encoded record off to be archived on the fly, without writing it to the filesystem. That method isn't in the documentation, but it would be nice to see there! It could be as simple as: # encoding a record MARC::Writer.encode(record) Thanks for your consideration! Tony
Re: [CODE4LIB] App Recommendations
Another possible alternative to Marginalia might be Markup.io: http://markup.io/ which I'm happy to plug because besides merely being cool, it was made by some folks that live in my neighborhood. It may not be exactly what you're looking for, though, since it's not necessarily text-centric. -Ross. On Thu, Mar 10, 2011 at 3:18 PM, Nathan Tallman ntall...@gmail.com wrote: Hi Code4Libers, I'm usually just a lurker on here (love to follow the threads and learn new things), but I'm presently in need of some recommendations. Might I seek the collective wisdom of the list? There are two things I'm seeking tech solutions for. The first is web app/widget/AJAX similar that allows users to make annotations on a web page, specifically finding aids in my case. I've already taken a look at Marginalia, but the demo had problems in Google Chrome. The other thing I need is an easy to use project management application, desktop or web. Does anyone have any favorites? Thank you Code4Lib! I hope to one day have enough tech skills to attend Code4Lib with pride! Right now, I'm about 2/3rds there ;) Nathan Tallman Associate Archivist American Jewish Archives
Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?
Cindy, sorry, I realize that was vague. I have shell access on Site5, but since you're using shared resources, they monitor your CPU/memory usage. During high volume on a particular server, they'll kill processes that are running to make sure they can meet demands. This *could* happen when you're trying to compile something, which tends to be CPU-intensive, although it just depends. I've had their trigger kick in while trying to install ruby gems, although it's completely unpredictable (that is, based on all sorts of variables) - sometimes the gems install with no problem, other times they're killed. Compiling yaz is probably less of an issue (the makefile calls lots of things that run intensely, but quickly) than the pecl install of php/yaz. Running things in nice (http://linux.die.net/man/2/nice) probably helps your chances, but YMMV. I don't think this policy is exclusive to Site5, pretty much all of the major shared web hosting providers will have something similar in place, otherwise users could constantly have processes running in shells. Like I said, though, it shouldn't be a problem, it just might take a few tries (which will be less work, in the long run, then running your own VPS). -Ross. On Tue, Mar 8, 2011 at 10:05 AM, Cindy Harper char...@colgate.edu wrote: Sorry - what do you mean by triggers their usage monitor - CPU usage above a certain threshold? Or they don't allow compiles? I spoke with Bluehost, and they indicated that if I got SSH access, I could try to compile it myself. I'll check to see if this is possible with Lunarpages, which we now have accounts with. Cindy Harper, Systems Librarian Colgate University Libraries char...@colgate.edu 315-228-7363 On Mon, Mar 7, 2011 at 1:58 PM, Ross Singer rossfsin...@gmail.com wrote: Cindy, I think this might be possible, depending on the provider. I have a site on Site5 and this seems pretty doable (it looks like I might have even tried this at some point, since I seem to have a compiled version of yaz in my home directory). It would probably take some rooting around in the forums to see how people successfully are installing PECL extensions and it might take a few tries to compile yaz successfully (since if it triggers their usage monitor, they'll kill the process), but I think it would be worth a shot. I would definitely recommend this before jumping to a VPS (and let's be realistic, everybody, if you're being this blasee about running a VPS, you are either investing some time/expertise sys admining it or you have an insecure server waiting to be exploited). Good luck! -Ross. On Mon, Mar 7, 2011 at 1:17 PM, Cindy Harper char...@colgate.edu wrote: I guess I was hoping to have service such as that provided by my current hosting service, where security,etc., updates for L A M P are all taken care of by the host. Any recommendations along those lines? One that provides that and still lets me install what I want? My service suggested that I go to a VPS account,where I'd have to do my own updates. Cindy Harper, Systems Librarian Colgate University Libraries char...@colgate.edu 315-228-7363 On Mon, Mar 7, 2011 at 11:00 AM, Han, Yan h...@u.library.arizona.eduwrote: You can just buy a node from a variety of cloud providers such as Amazon EC2, Linode etc. (It is very easy to build anything you want). Yan -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cindy Harper Sent: Sunday, March 06, 2011 10:54 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] LAMP Hosting service that supports php_yaz? At the risk of exhausting my quota of messages for the month - Our LAMP hosting service does not support PECL extension php_yaz. Does anyone know of a service that does? Cindy Harper, Systems Librarian Colgate University Libraries char...@colgate.edu 315-228-7363
Re: [CODE4LIB] facets in Atom feeds
So that seems to just be using the atom:category element, which is clever, but it wouldn't give you facet counts for the total results set (just for the returned page). It's possible to have categories across the entire result set (they'd be at the feed level, rather than the entry level), but you wouldn't have any counts or links for your filtered search results and you'd need some way to turn the scheme attribute into facet field, although all of these are pretty easily achievable (they'd just really need an XML namespace and some consensus). Take: category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/books/2008#volume'/ You could easily do something like: category scheme='http://example.org/facets/fields#subject' term='History' ex:facetCount=1024 ex:href='http://example.org/search?q=your+searchfct[subject]=History' / or whatever. -Ross. On Thu, Mar 3, 2011 at 3:06 PM, Peter Murray peter.mur...@lyrasis.org wrote: That's pretty cool, but I had to fire up Parallels on my Mac to see it in MSIE. For those that may not have Windows readily available, this is what it looks like: http://twitpic.com/45r6sn Peter On Mar 3, 2011, at 1:51 PM, Jonathan Rochkind wrote: Someone recently on this list was saying something about ways to embed facets in for instance Atom feeds. I was reminded of that, because checking out an Atom feed from Google Books Data API, in Internet Explorer... Internet Explorer displays 'facet' type restrictions for it, under a heading Filter by category. It also displays sort options, apparently somehow the feed is advertising it's sort options too in a way that a client like IE can act upon? Haven't looked into the details, but here's an example feed: http://books.google.com/books/feeds/volumes?q=LCCN07037314 Look at it in IE for instance. So whatever's being done here is apparently already somewhat standard, at least IE recognizes what Google does? I'd encourage SRU or whoever to follow their lead. [I agree that simply copying the Solr API for a standard like SRU is not the way to go -- Solr is an application that supports various low-level things that are not appropriate in that level of detail for a standard like SRU or what have you, at least not until they've been shown to be needed.] -- Peter Murray peter.mur...@lyrasis.org tel:+1-678-235-2955 Ass't Director, Technology Services Development http://dltj.org/about/ Lyrasis -- Great Libraries. Strong Communities. Innovative Answers. The Disruptive Library Technology Jester http://dltj.org/ Attrib-Noncomm-Share http://creativecommons.org/licenses/by-nc-sa/2.5/
Re: [CODE4LIB] GPL incompatible interfaces
On Fri, Feb 18, 2011 at 9:30 AM, Eric Hellman e...@hellman.net wrote: Since the Metalib API is not public, to my knowledge, I don't know whether it gets disclosed with an NDA. And you can't run or develop Xerxes without an ExLibris License, because it depends on a proprietary and unspecified data set. This is a very good point (and neither here nor there on the licensing issue). Ex Libris, in particular, has always had an awkward relationship between the NDA-for-customers-eyes-only policy regarding their X-Services documentation and their historic tolerance for open source applications built upon said services. The latter undermines the former significantly, since the documentation could theoretically be reverse-engineered if the open source projects' uses of it are comprehensive enough. I'll leave whether or not having an NDA on API documentation makes sense as an exercise of the reader. It does mean, however, that Ex Libris could at any point claim that these projects violate those terms, which is a risk, although probably a risk worth taking. On the opposite end of the spectrum, you have SirsiDynix who refuse the distribution of applications written using their Symphony APIs to anybody but SD customers-in-good-standing-that-have-received-API-training. While SD's position is certainly draconian (and, in my opinion, rather counter-productive), it does let the developer know where she or he stands with no sense of ambiguity coming from the company. -Ross.
Re: [CODE4LIB] EZB
On Thu, Feb 17, 2011 at 11:16 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Interesting, does their link resolver API do article-level links, or just journal title level links? I/you/one could easily write a plugin for Umlaut for their API, would be an interesting exersize. I think it would also be interesting to make the data available for download/reuse, if possible. -Ross. On 2/17/2011 1:18 AM, Markus Fischer wrote: The cheapest and best A to Z list i know is the german EZB: http://rzblx1.uni-regensburg.de/ezeit/index.phtml?bibid=Acolors=7lang=en This list is maintained by hunderds of libraries. You just mark those journals you have licensed and that's it. Not very widely known: they do also provide an API which you can use as a free linkresolver. There are free tools you can plug into this API and you've got your linkresolver. The list is incredible accurate and you'll have almost no effort: any change made by one library is valid for all. Let me know if you need more information. Markus Fischer Am 16.02.2011 22:18, schrieb Michele DeSilva: Hi Code4Lib-ers, I want to chime in and say that I, too, enjoyed the streaming archive from the conference. I also have a question: my library has a horribly antiquated A to Z list of databases and online resources (it's based in Access). We'd like to do something that looks more modern and is far more user friendly. I found a great article in the Code4Lib journal (issue 12, by Danielle Rosenthal Mario Bernado) about building a searchable A to Z list using Drupal. I'm also wondering what other institutions have done as far as in-house solutions. I know there're products we could buy, but, like everyone else, we don't have much money at the moment. Thanks for any info or advice! Michele DeSilva Central Oregon Community College Library Emerging Technologies Librarian 541-383-7565 mdesi...@cocc.edu
Re: [CODE4LIB] Do you have Project Gutenberg (or other public domain e-books) MARC Records in your OPAC?
http://www.gutenberg.org/wiki/Main_Page Project Gutenberg is the place where you can download over 33,000 free ebooks to read on your PC, iPad, Kindle, Sony Reader, iPhone, Android or other portable device. Over 100,000 free ebooks are available through our Partners, Affiliates and Resources. http://www.gutenberg.org/wiki/Gutenberg:Partners%2C_Affiliates_and_Resources -Ross. On Thu, Feb 17, 2011 at 12:35 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hmm, what does ebook mean in this context exactly? Gutenberg has a heck of a lot more than 35k digital texts of books, I consider them all 'ebooks'. What does Gutenberg consider 'ebooks' exactly? On 2/17/2011 12:29 PM, Charles Ledvina wrote: Hello Matt: There are 35,224 records in this bzip file from Project Gutenberg: http://www.gutenberg.org/feeds/catalog.marc.bz2 from this page: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs It is their complete eBook collection and they say the file is updated daily. --Charles Ledvina On Wed, 16 Feb 2011 17:03:58 -0500, Matt Amorymatt.am...@gmail.com wrote: If so can you send me a URL? Thanks much! Matt Amory On Wed, Feb 16, 2011 at 4:18 PM, Michele DeSilvamdesi...@cocc.edu wrote: Hi Code4Lib-ers, I want to chime in and say that I, too, enjoyed the streaming archive from the conference. I also have a question: my library has a horribly antiquated A to Z list of databases and online resources (it's based in Access). We'd like to do something that looks more modern and is far more user friendly. I found a great article in the Code4Lib journal (issue 12, by Danielle Rosenthal Mario Bernado) about building a searchable A to Z list using Drupal. I'm also wondering what other institutions have done as far as in-house solutions. I know there're products we could buy, but, like everyone else, we don't have much money at the moment. Thanks for any info or advice! Michele DeSilva Central Oregon Community College Library Emerging Technologies Librarian 541-383-7565 mdesi...@cocc.edu
Re: [CODE4LIB] Unexpected ruby-marc behavior
No, that's expected behavior (and how it's always been). You'd need to do reader.rewind to put your enumerator cursor back to 0 to run back over the records. It's basically an IO object (since that's what it expects as input) and behaves like one. -Ross. On Thu, Jan 27, 2011 at 2:03 PM, Cory Rockliff rockl...@bgc.bard.edu wrote: So I was taking ruby-marc out for a spin in irb, and encountered a bit of a surprise. Running the following: require 'marc' reader = MARC::Reader.new('filename.mrc') reader.each {|record| puts record['245']} produces the expected result, but every subsequent call to reader.each {|record| puts record['245']} returns nil. Am I missing something obvious? I don't remember this being the case before. Thanks! Cory [running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / rubygems via homebrew]
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
On Wed, Dec 15, 2010 at 10:20 AM, Karen Coyle li...@kcoyle.net wrote: Where I think we run into a problem is when we try to use FRBR as a record structure rather than conceptual guidance, which is what you allude to. This is the place where some implementations have decided to either merge Work and Expression or Expression and Manifestation because the Expression layer seems to make user displays more difficult. (I have also heard that the XC project found that putting the FRBR levels back together for display was inefficient.) Right, and I think this leads to all sorts of other ugliness, too: FRBR-izing aggregations of things (musical albums, anthologies, conference proceedings, etc.) is a potential UX nightmare (from both an end user AND data entry perspective). That said, there are enormous benefits of modeling these things, too: I am not suggesting we sweep them under the rug (which is basically what we've historically done), but some we're going to need to figure out an acceptable balance. -Ross.
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
On Tue, Dec 14, 2010 at 4:03 PM, McDonald, Stephen steve.mcdon...@tufts.edu wrote: I couldn't really say, and I'm not sure that it matters. Libraries have no need to worry about Works which have no Manifestation, so in practice I don't find it hard to recognize the Work-Manifestation relationship in the materials we actually work with. This is a pretty narrow view of what libraries need to worry about. There are lots of Works that have no Manifestations, the antiquities are littered with them (references to things that only existed in the library of Alexandria, etc.). Just because they're not on our shelf (or any shelf) doesn't mean we shouldn't acknowledge them. -Ross.
Re: [CODE4LIB] code4lib 2011: Hotel registration
But... then I'd have to talk to a human being! -Ross. On Mon, Dec 13, 2010 at 1:06 PM, Ranti Junus ranti.ju...@gmail.com wrote: Folks, perhaps it'd be easier if you call the hotel instead, if the website doesn't work well. Please see the info from Andrew Darby below. thanks, ranti. -- Forwarded message -- From: Andrew Darby ada...@ithaca.edu Date: Mon, Dec 13, 2010 at 12:59 PM Subject: Re: Trouble with registration site To: code4lib...@googlegroups.com I'd add that I had problems with the hotel online reservation . . . fill out enormous form, get a no rooms available message, redo everything, repeat 3 or 4 times without success. Much easier to call the hotel directly . . . they were very nice. Phone: 812-856-6381 (but they might redirect you to another #) -- Bulk mail. Postage paid.
Re: [CODE4LIB] MARCXML - What is it for?
Alex, I think the problem is data like this: http://lccn.loc.gov/96516389/marcxml And while we can probably figure out a pattern to get the semantics out this record, there is no telling how many other variations exist within our collections. So we've got lots of this data that is both hard to parse and, frankly, hard to find (since it has practically zero machine readable data in fields we actually use) and it needs to coexist with some newer, semantically richer format. What I'm saying is that the library's legacy data problem is almost to the point of being existential. This is certainly a detriment to forward progress. Analogously (although at a much smaller scale), my wife and I have been trying for about 2 years to move our checking account from our out of state bank to something local. The problem is that we have built up a lot of infrastructure around our old bank (direct deposit and lots of automatic bill pay, etc.): migration would not only be time consuming, any mistakes made could potentially be quite expensive and we have a lot of uncertainty of how long it would actually take to migrate (and how that might affect the flow of payments, etc.). It's been, to date, easier for us just to drive across the state line (despite the fact that it's way out of our way to anywhere) rather than actually deal with it. In the meantime, more direct bill pay things have been set up and whatnot making our eventual migration that much more difficult. I do think it would be useful to figure out what exactly in our legacy data is found only in libraries (that is, we could ditch this shoddy The Last Waltz record and pull the data from LinkedMDB or Freebase or somewhere) and determine the scale of the problem that only we can address, but even just this environmental scan is a fairly large undertaking. -Ross. On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote: Here, I think you're guilty of radically underestimating lots of people around the library world. No one thinks MARC is a good solution to our modern problems, and no one who actually knows what MARC is has trouble understanding MARC-XML as an XML serialization of the same old data -- certainly not anyone capable of meaningful contribution to work on an alternative. Slow down, Tex. Lots of people in the library world is not the same as developers, or even good developers, or even good XML developers, or even good XML developers who knows what the document model imposes to a data-centric approach. The problem we're dealing with is *hard*. Mind-numbingly hard. This is no justification for not doing things better. (And I'd love to know what the hard bits are; always interesting to hear from various people as to what they think are the *real* problems of library problems, as opposed to any other problem they have) The library world has several generations of infrastructure built around MARC (by which I mean AACR2), and devising data structures and standards that are a big enough improvement over MARC to warrant replacing all that infrastructure is an engineering and political nightmare. Political? For sure. Engineering? Not so much. This is just that whole blinded by MARC issue that keeps cropping up from time to time, and rightly so; it is truly a beast - at least the way we have come to know it through AACR2 and all its friends and its death-defying focus on all things bibliographic - that has paralyzed library innovation, probably to the point of making libraries almost irrelevant to the world. I'm happy to take potshots at the RDA stuff from the sidelines, but I never forget that I'm on the sidelines, and that the people active in the game are among the best and brightest we have to offer, working on a problem that invariably seems more intractable the deeper in you go. Well, that's a pretty scary sentence, for all sorts of reasons, but I think I shall not go there. If you think MARC-XML is some sort of an actual problem What, because you don't agree with me the problem doesn't exist? :) and that people just need to be shouted at to realize that and do something about it, then, well, I think you're just plain wrong. Fair enough, although you seem to be under the assumption that all of the stuff I'm saying is a figment of my imagination (I've been involved in several projects lambasted because managers think MARCXML is solving some imaginary problem; this is not bullshit, but pain and suffering from the battlefields of library development), that I'm not one of those developers (or one of you, although judging from this discussion it's clear that I am not), that the things I say somehow doesn't apply because you don't agree with, umm, what I'm assuming is my somewhat direct approach to stating my heretic opinions. Alex -- Project Wrangler, SOA, Information Alchemist, UX,
Re: [CODE4LIB] Looking for OAuth experts
On Thu, Oct 14, 2010 at 11:11 AM, MJ Ray m...@phonecoop.coop wrote: Ross Singer wrote: Unlike Twitter, however, we're starting from nothing. There's nothing currently invested in ILS-DI clients that would break by committing solely to OAuth (or anything, for that matter). Are you sure there's nothing currently invested? I thought the Koha community was already implementing ILS-DI so I assume there's some client using it, as people don't tend to fund useless developments. I don't remember if any of the co-op's client libraries are using it yet, though. I am pretty certain of this. The current group is focusing on a different set of functionality (primarily around borrower account services) than the DLF group got to (which was about harvesting bib records and limited item availability support). In some ways, however, any answer I give here is correct. The DLF group provided no specifics on how to implement their functionality. HarvestBibliographicRecords could be provided via OAI-PMH or Atom, they provided a new XML format for including holdings availability, but no specification on how it would be delivered, etc. That is, the DLF ILS-DI provided guidelines of functionality that needed to present, but not a specification on how it needed to operate (they did give recommendations). So any Koha implementation would just be an interpretation of these guidelines, but there was no specification that anyone can point to to say that an implementation is compliant. [ILS-DI] It's no longer under the auspices of the DLF and the priority of functionality has changed. [...] OK, if it's no longer under the auspices of the DLF are you still in contact with BibLibre? They are more than welcome to participate. It's not a closed process. Indeed, and I hope the reply was likewise helpful. It was. More answers than questions, which is always good! That said, I'm still not seeing the benefits of OAuth for ILS-DI compared to existing HTTP authentication and authorization methods, really. Ok, so let me provide you with a use case: Imagine a vendor hosted discovery service (EBSCO Discovery Service, Worldcat Local or Summon, for example, so we're not talking about any sort of 'edge case'). To use HTTP authentication, one of the following scenarios would need to be true: - they would either need to have access to a user's credentials (which is a non-starter in many places) - they would need to be a trusted superuser of the ILS-DI API service (they authenticate elsewhere, say an SSO, then can perform lookups as anybody) - some kind of token based access would need to be established between the discovery layer and the ILS-DI API The last scenario is exactly what OAuth standardizes, so we're not rolling our own, niche security protocol. If you want another use case, imagine a service such as LibraryElf (http://www.libraryelf.com/). A protocol like OAuth would allow you to share your borrower account with a (useful, I think!) service like this *without* handing over your user credentials. One can also imagine all sorts of interesting services cropping up in places like LibraryThing about what you've currently got checked out, placing holds on books recommended for you, etc. HTTP Authentication/Authz pretty much assumes all services will be provided locally which I think is a fairly antiquated assumption. -Ross. Regards, -- MJ Ray (slef), member of www.software.coop, a for-more-than-profit co-op. Past Koha Release Manager (2.0), LMS programmer, statistician, webmaster. In My Opinion Only: see http://mjr.towers.org.uk/email.html Available for hire for Koha work http://www.software.coop/products/koha
Re: [CODE4LIB] simple,flexible ILS for a small library.
You know, with Jonathan's rephrasing (if it's accurate), it crossed my mind that most ILSes that support course reserves should be able to handle this. It's extremely common for course reserves to belong to the instructor that is putting them on reserve and the ILS would need to keep track of that to return said materials back to the lending instructor. Now, I have no idea if most reserves departments do this via notes in the record or whatnot, but it might at least be a model for how this could work with a traditional ILS. Since I neither work for a library nor work for a vendor whose ILS supports course reserves (since that model doesn't really exist in the UK, apparently), I can't actually confirm how this works in practice. But I'm guessing somebody on this list can. -Ross. On Mon, Oct 4, 2010 at 4:39 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Are any currently existing open source ILSs flexible enough to support this model? I kind of doubt it. What are you are doing sounds neat, but is not typical library workflow. Tell me if I'm re-describing what you're talking about correctly: Every book in the library essentially belongs to one of the patrons. Patrons can both borrow books, and loan books to other patrons. The library is basically just a facilitator of patron-to-patron lending. So you need to know what books are out that are owned by a certain patron, as well as what books are being borrowed by a certain patron. You need to know what books are over-due that are owned by a certain patron, etc. Creating a location, branch or collection code for each patron is going to be un-manageable with more than a few dozen patrons. I don't think most existing ILS systems -- open source or not -- are going to be set up to handle that system. On the other hand, many existing ILS systems are going to have all sorts of stuff you _don't_ need, like acquisitions, and serials tracking, and such. I wonder if you are better served looking for software that is NOT library software to handle the actual circulation. Maybe there is some non-library software that is designed for a network of people lending stuff to each other? And then you could always put a Solr-based discovery system on top of that for actual _finding_ of books available to be borrowed, perhaps using VuFind or Blacklight or rolling your own. But the underlying tracking of circulation is actually the tricky part -- perhaps write your own custom software for that, if nothing open source can be found, but then export all items to a seperate Solr-based component for the actual search engine. Jonathan ... wrote: Reading my original post, perhaps I should have made the important point more clear. My question is about an ILS suitable for a library that does not own its books, but is borrowing those books from patrons. The books all have lease end dates associated with them. Book lenders are very similar to book borrowers, and they require end of day processing to see if any of the library's books are due back to them, in the same way borrower's books are due back to the library. So, in the last two posts which mentioned simple borrowing, that is what I am wanting, but for the library to be simply borrowing the books AND for patron to simply borrow those same books out of the library. Book lenders and book borrowers are essentially the same, except lenders first check a book in, and the due date is when the book leaves the library, and book borrowers check books out and then back in again. Of course, many book borrowers are also lenders. Are any currently existing open source ILSs flexible enough to support this model? Sorry for the confusion, Elliot
Re: [CODE4LIB] simple,flexible ILS for a small library.
I think your functional requirement that made this non-trivial was your mention of it needing ILL functionality. There's a definite threshold that has to be crossed before you start seeing something like that integrated into the ILS. If you've got some other way to deal with ILL, I'd suggest OpenBiblio (http://obiblio.sourceforge.net/) as a super simple, super basic ILS. It deals with inventory, borrowers, circulation, etc. but nothing terribly sophisticated. You could use it with VuFind via the Jangle connector: http://jangle.googlecode.com/svn/trunk/connectors/openbiblio/ -Ross. On Mon, Sep 27, 2010 at 6:15 PM, ... offonoffoffon...@gmail.com wrote: Hello, Some folks in the VuFind library suggested I ask here. We are starting a small library and thinking of using VuFind as our online catalog. As for the ILS we would like something small and simple (evergreen and others seem massive for the small amount of functionality we need), and especially something which is flexible enough to allow us to base our library on book sharing rather than an institutionally owned collection. book sharing will probably happen through creative use of inter-library loan functionality, and so an ILS that has a solid and flexible ILL is necessary. We will probably have less than 1,000 books (perhaps a couple thousand if things really take off) and less than 100 borrowers. Probably about as many book sharers (ie, partner libraries) as borrowers. Does anyone have experience with the ILS which already have drivers for VuFind? I think the list is: * SirsiDynix Horizon * Sirsi Unicorn * Voyager: * VTLS Virtua * Innovative: * DAIA / OCLC PICA: * NewGenLib: If you can, please comment on the suitability of these ILS for our system (low complexity and flexible ILL system). I have considered just writing some administrative scripts in python. It would be a good project and not ridiculously difficult. I would much rather write the whole thing from scratch than try to write a VuFind driver for an ILS not yet supported. Thanks for reading, Elliot
Re: [CODE4LIB] Looking for OAuth experts
On Mon, Sep 20, 2010 at 4:01 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Can you give some details (or references) to justify the belief that OAuth isn't ready yet? (The fact that Twitter implemented it poorly does not seem apropos to me, that's just a critique of Twitter, right?). I don't agree or disagree, just trying to take this from fud-ish rumor to facts to help me and others understand and make decisions. Agreed on this assessment, Jonathan. MJ, can you extrapolate on your concerns, because that Ars Technica article is not going to cut it for anything more than to avoid the choices that Twitter made. And even by the standards of that article, I'm not sure that OAuth is inappropriate for the ILS-DI's use cases which are: 1) server-to-server communication as the first priority 2) something relatively standardized and abstracted enough to allow for institutions' local authentication mechanisms. To quote from that article: To be clear, I don't think that OAuth is a failure or a dead end. I just don't think that it should be treated as an authentication panacea to the detriment of other important security considerations. What it comes down to is that OAuth 1.0a is a horrible solution to a very difficult problem. It works acceptably well for server-to-server authentication, but there are far too many unresolved issues in the current specification for it to be used as-is on a widespread basis for desktop applications. It's simply not mature enough yet. Even in the context of server-to-server authentication, OAuth should be viewed as a necessary evil rather than a good idea. It should be approached with extreme trepidation and the high level of caution that is warranted by such a convoluted and incomplete standard. Careless adoption can lead to serious problems, like the issues caused by Twitter's extremely poor implementation. As I have written in the past, I think that OAuth 2.0—the next version of the standard—will address many of the problems and will make it safer and more suitable for adoption. The current IETF version of the 2.0 draft still requires a lot of work, however. It still doesn't really provide guidance on how to handle consumer secret keys for desktop applications, for example. In light of the heavy involvement in the draft process by Facebook's David Recordon, I'm really hopeful that the official standard will adopt Facebook's sane and reasonable approach to that problem. Which basically spells out the problem the ILS-DI group is facing: an incomplete, but evolving standard with heavy industry support, or... nothing. We are still very much in the fact-gathering stage, so any suggestions are welcome. At the glacial pace of library development, I think it's safe to assume OAuth 2.0 will be less of a moving target by any implementation stage. -Ross.
Re: [CODE4LIB] Looking for OAuth experts
On Mon, Sep 20, 2010 at 5:21 PM, MJ Ray m...@phonecoop.coop wrote: Ross Singer wrote: Agreed on this assessment, Jonathan. MJ, can you extrapolate on your concerns, because that Ars Technica article is not going to cut it for anything more than to avoid the choices that Twitter made. I've just sent another message trying to do that. Hope it helps. Yes. Well, at any rate it helps me refine my problem statement some. The concern about distributed apps (while legitimate) doesn't worry me quite so much in this particular case. The main use case we would be looking to solve is for known applications to access (and, depending on how trusted they are, manipulate) confidential (again, according to the level of trust) user information in an ILS without needing to store their credentials. If there is an added bonus of being to use it for all sorts of other, distributed applications, so much the better, but if that's not viable or secure, it's no problem since it would be outside of the necessary requirements, anyway. And even by the standards of that article, I'm not sure that OAuth is inappropriate for the ILS-DI's use cases which are: 1) server-to-server communication as the first priority 2) something relatively standardized and abstracted enough to allow for institutions' local authentication mechanisms. I think FOSS servers would be affected by the published-key spoofing flaw too, wouldn't they? There are open source OAuth server implementations out there, I assume there's some local salt-ing going on. Another key difference between ILS-DI's use case and a service like Twitter's is that, from the start, the expectation can be set that only whitelisted clients have access. I'm not sure if Johns Hopkins or Stanford or NYPL cares much if there's a teeming app marketplace that can be built on top of their ILS API as much as simple and consistent access from their discovery interfaces, courseware, electronic reserves application, etc.. The very attributes that may make OAuth questionable for services like Twitter, Facebook, and their ilk may be non-factors for an ILS API simply because the environment can be much more controlled. The problem would be that if, indeed, these flaws do undermine public support for OAuth, the advantages it brings (client/server libraries, awareness outside of very library-specific domains) would be lost if there's no community using it. Some of the projects that want to support ILS-DI are FOSS - one of the Koha support companies signed some ILS-DI announcement IIRC, while another wrote some of the code to implement it. Which basically spells out the problem the ILS-DI group is facing: an incomplete, but evolving standard with heavy industry support, or... nothing. Glad to see it's recognised that OAuth is incomplete. Really all that's recognized is that it exists and is one of the only, if not the only, protocol that allows for the decentralization of auth/authz without the client service needing to manage personal credentials. That's not necessarily an ILS-DI requirement, but it sure would be useful if we had it. I've heard as much opposition as support among developers. On the one hand, it's more work to sell. On the other, they're now even more at the mercy of big service providers who can break their applications (and so eat their support budgets) at will. Unlike Twitter, however, we're starting from nothing. There's nothing currently invested in ILS-DI clients that would break by committing solely to OAuth (or anything, for that matter). If there is broad language support to build clients and servers, this should be less of an issue. We are still very much in the fact-gathering stage, so any suggestions are welcome. [...] If the problem that the group is trying to solve was explained on this list, readers might be able to offer suggestions. Jonathan gave a pretty good summary, but I'll tack on. The ILS-DI initiative was initially proposed by the digital library federation to provide following functionality out of integrated library systems: Level 1: Basic Discovery Interfaces * HarvestBibliographicRecords * HarvestExpandedRecords * GetAvailability * GoToBibliographicRequestPage Level 2: Elementary OPAC supplement All of the above, plus * HarvestAuthorityRecords * HarvestHoldingsRecords * GetRecord * Search * Scan * GetAuthorityRecords * Either OutputRewritablePage or OutputIntermediateFormat Level 3: Elementary OPAC alternative All of the above, plus * LookupPatron * AuthenticatePatron * GetPatronInfo * GetPatronStatus * GetServices * RenewLoan * HoldTitle * HoldItem * CancelHold * RecallItem * CancelRecall Level 4: Robust/domain specific discovery platforms All of the above, plus * SearchCourseReserves * Explain * Both OutputRewritablePage and OutputIntermediateFormat It's no longer under the auspices of the DLF and the priority of functionality has changed. We're now focused first
Re: [CODE4LIB] content type for rdf
It depends on how you're serving your RDF. RDF/XML is application/rdf+xml N3 is text/n3;charset=utf-8 Turtle is text/turtle NTriples are text/plain -Ross. On Fri, Aug 20, 2010 at 7:03 AM, Eric Lease Morgan emor...@nd.edu wrote: I am in the process of creating sets of cool URLs, and I need to know the best (correct) content type of RDF. Is it application/rdf+xml? Similarly, is the correct content type for HTML equal to text/html? -- Eric Morgan
Re: [CODE4LIB] open source proxy packages?
On Sun, Aug 15, 2010 at 8:10 PM, Cary Gordon listu...@chillco.com wrote: In my experience, I haven't found anything that is as easy to use (or even close) to EZProxy. Unless you value your time at under $5/hr., or are a FLOSS zealot (I think of myself as a semi-zealot), it is a bargain at $500. A significant part of that bargain is the great community that supports it. +1 I was trying to figure out how to word this in a way that wasn't discouraging or too EZProxy fanboy-ish, but I honestly could not see an alternative that, in the end, would be nearly as cost effective as EZProxy. EZProxy's (lifetime!) price, tiny footprint and support are going to be hard to beat. Art Rhyno once attributed Shibboleth's lack of uptake on EZProxy: why bother with this ultra-complicated authentication/authorization mechanism when EZProxy just works, on unlimited machines, with unlimited upgrades, for $500? -Ross. This is not to say, or course, that it is not possible to do this with Squid, which was the first effective solution, or other tools. Cary On Sat, Aug 14, 2010 at 10:05 AM, phoebe ayers phoebe.w...@gmail.com wrote: Hello all, Are there any open source proxies for libraries that have been developed, e.g. an open source alternative to EZProxy or similar? I'm working with a non-profit tech foundation that is interested in granting access to a few licensed resources to a few hundred people who are scattered around the world. thanks, Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] schema for some web page
http://dublincore.org/documents/dcmi-terms/#terms-relation This term is intended to be used with non-literal values as defined in the DCMI Abstract Model (http://dublincore.org/documents/abstract-model/). As of December 2007, the DCMI Usage Board is seeking a way to express this intention with a formal range declaration. So if you use the dcterms namespace (rather than dc elements) you should be fine. -Ross. On Thu, Jul 8, 2010 at 9:48 AM, Jonathan Rochkind rochk...@jhu.edu wrote: In my experience, you can't tell much about what you'd really want to know for user needs from the indicators or subfield 3's, at least in my catalog. FRBR relationships probably don't work because the destination of an arbitrary 856 is not neccesarily a FRBR entity, and even if it is there's no way to know that (or what class of entity) from the data. It really is just generic some kind of related web page. So dc:relation does sound like the right vocabulary element for generic related web page page, thanks. Is the value of dc:relation _neccesarily_ a URI/URL? I hope so, because otherwise I'm not sure dc:relation is sufficient, as I really do need something that says some related URL. Thanks for the advice, Jonathan Ed Summers wrote: On Wed, Jul 7, 2010 at 7:00 PM, Doran, Michael D do...@uta.edu wrote: Of course, subfield $3 values are not any kind of controlled vocabulary, so it's hard to do much with them programmatically. A few years ago I analyzed the subfield 3 values in the Library of Congress data up at the Internet Archive [1]. Of course it's really simple to extract, but I just pushed it up to GitHub, mainly to share the results [2]. I extracted all the subfield 3 values from the 12M? records, and then counted them up to see how often they repeated [3]. As you can see it's hardly controlled, but it might be worthwhile coming up with some simple heuristics and properties for the familiar ones: you could imagine dcterms:description being used for Publisher description, etc. Of course the $3 in your catalog data might be different from LCs, but maybe we could come up with a list of common ones on a wiki somewhere, and publish a little vocabulary that covered the important relations? //Ed [1] http://www.archive.org/details/marc_records_scriblio_net [2] http://github.com/edsu/beat [3] http://github.com/edsu/beat/raw/master/types.txt
[CODE4LIB] MARC Codes for Forms of Musical Composition
Hi everybody, I just wanted to let people know I've made the MARC codes for forms of musical compositions (http://www.loc.gov/standards/valuelist/marcmuscomp.html) available as http://purl.org/ontology/mo/Genres. http://purl.org/NET/marccodes/muscomp/ They follow the same naming convention as they would in the MARC 008 or 047, so it's easy to map (that is, no lookup needed) from your MARC data: http://purl.org/NET/marccodes/muscomp/sy#genre etc. The RDF is available as well: http://purl.org/NET/marccodes/muscomp/sy.rdf I'd love any feedback/suggestions/corrections/etc. Also, you can look around to see MARC country codes, geographic area codes and language codes. Eventually I would like to get all of the MARC codes (not already modeled by LC) in there (http://www.loc.gov/standards/valuelist/). Thanks, -Ross.
Re: [CODE4LIB] Planet Code4Lib RSS feed
There seems like there's a bad entry in there: http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fplanet.code4lib.org%2Fatom.xml from the C4L Journal's feed which may be screwing up the aggregated feed as a whole. -Ross. On Mon, Jun 28, 2010 at 11:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Code4libbers, anyone want to help out debugging this? I'm kind of the 'steward' of the planet code4lib, but haven't really spent much time with it understanding it technically, and won't really have any time to look at it for a while, I'm kind of swamped at work. Jonathan Steve Casburn wrote: Jonathan, Eric Lease Morgan suggested that you might be the right person to report this to... The RSS feed (in all three flavors) for Planet Code4Lib seems to have been down since June 12. I have received no new posts during that time in my newsreader (NewsFire for Mac OS X), and the webpage for each version of the RSS feed are blank. Steve
Re: [CODE4LIB] MODS and DCTERMS
On Tue, May 4, 2010 at 7:55 AM, Mike Taylor m...@indexdata.com wrote: Having read the rest of this thread, I find that nothing that's been said changes my initial gut reaction on reading this question: DO NOT USE DCTERMS. It's vocabulary is Just Plain Inadequate, and not only for esoteric cases like the Alternative Chronological Designation of First Issue or Part of Sequence field that Karen mentioned. Despite having 70 (seventy!) elements, it's lacking fundamental fields for describing articles in journals -- there are no journalTitle, volume, issue, startPage or endPage fields. That, for me, is a deal-breaker. If you're using Dublin Core as XML, I agree with this. If you're using Dublin Core as RDF (which is, honestly, the only thing it's really good for), this is a non-issue. -Ross.
Re: [CODE4LIB] MODS and DCTERMS
On Tue, May 4, 2010 at 10:26 AM, Mike Taylor m...@indexdata.com wrote: Ross, I think that got mangled in the sending -- either that, or it's some strange format that I've never seen before. That said, I am tremendously impressed by all the information you obtained there. What software did you use, how much of this did you have to feed it by hand, and how much did it intuit from existing structured datasets? Oh, that's probably not mangled, that's probably just how Turtle looks :) I'll also send it as RDF/XML. That graph was compiled by a Google Scholar search on Mike Taylor dinosaur, the Ingenta page describing your article, a text editor (TextMate) and 30 minutes of my life I'll never get back. Ok, here's the graph as RDF/XML: ?xml version=1.0 encoding=utf-8? rdf:RDF xmlns:bibo=http://purl.org/ontology/bibo/; xmlns:dcterms=http://purl.org/dc/terms/; xmlns:foaf=http://xmlns.com/foaf/0.1/; xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:xsd=http://www.w3.org/2001/XMLSchema#integer; bibo:AcademicArticle rdf:nodeID=article1 dcterms:abstract xml:lang=enXenoposeidon proneneukos gen. et sp. nov. is a neosauropod represented by BMNH R2095, a well-preserved partial mid-to-posterior dorsal vertebra from the Berriasian-Valanginian Hastings Beds Group of Ecclesbourne Glen, East Sussex, England. It was briefly described by Lydekker in 1893, but it has subsequently been overlooked. This specimen's concave cotyle, large lateral pneumatic fossae, complex system of bony laminae and camerate internal structure show that it represents a neosauropod dinosaur. However, it differs from all other sauropods in the form of its neural arch, which is taller than the centrum, covers the entire dorsal surface of the centrum, has its posterior margin continuous with that of the cotyle, and slopes forward at 35 degrees relative to the vertical. Also unique is a broad, flat area of featureless bone on the lateral face of the arch; the accessory infraparapophyseal and postzygapophyseal laminae which meet in a V; and the asymmetric neural canal, small and round posteriorly but large and teardrop-shaped anteriorly, bounded by arched supporting laminae. The specimen cannot be referred to any known sauropod genus, and clearly represents a new genus and possibly a new `family'. Other sauropod remains from the Hastings Beds Group represent basal Titanosauriformes, Titanosauria and Diplodocidae; X. proneneukos may bring to four the number of sauropod `families' represented in this unit. Sauropods may in general have been much less morphologically conservative than is usually assumed. Since neurocentral fusion is complete in R2095, it is probably from a mature or nearly mature animal. Nevertheless, size comparisons of R2095 with corresponding vertebrae in the Brachiosaurus brancai holotype HMN SII and Diplodocus carnegii holotype CM 84 suggest a rather small sauropod: perhaps 15 m long and 7600 kg in mass if built like a brachiosaurid, or 20 m and 2800 kg if built like a diplodocid./dcterms:abstract dcterms:creator rdf:resource=_:author1/ dcterms:creator rdf:resource=_:author2/ dcterms:isPartOf rdf:resource=_:journal1/ dcterms:issued rdf:datatype=http://www.w3.org/2001/XMLSchema#integerdate;2007-11/dcterms:issued dcterms:language rdf:resource=http://purl.org/NET/marccodes/languages/eng#lang/ dcterms:subject rdf:resource=http://id.loc.gov/authorities/sh85038094#concept/ dcterms:subject rdf:resource=http://id.loc.gov/authorities/sh85097127#concept/ dcterms:subject rdf:resource=http://id.loc.gov/authorities/sh85117730#concept/ dcterms:title xml:lang=enAN UNUSUAL NEW NEOSAUROPOD DINOSAUR FROM THE LOWER CRETACEOUS HASTINGS BEDS GROUP OF EAST SUSSEX, ENGLAND/dcterms:title bibo:authorList rdf:Description rdf:first rdf:resource=_:author1/ rdf:rest rdf:Description rdf:first rdf:resource=_:author2/ rdf:rest rdf:resource=http://www.w3.org/1999/02/22-rdf-syntax-ns#nil/ /rdf:Description /rdf:rest /rdf:Description /bibo:authorList bibo:doi10./j.1475-4983.2007.00728.x/bibo:doi bibo:issue rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;6/bibo:issue bibo:numPages rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;18/bibo:numPages bibo:pageEnd rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1564/bibo:pageEnd bibo:pageStart rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1547/bibo:pageStart bibo:pages1547-1564/bibo:pages bibo:volume rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;50/bibo:volume /bibo:AcademicArticle bibo:Journal rdf:nodeID=journal1 dcterms:publisher rdf:resource=_:publisher1/ dcterms:titlePalaeontology/dcterms:title bibo:issn0031-0239/bibo:issn foaf:homepage rdf:resource=http://www3.interscience.wiley.com/journal/118531917/home?CRETRY=1amp;SRETRY=0/ /bibo:Journal
Re: [CODE4LIB] It's cool to love milk and cookies
But is there a NISO standard for this? On Fri, Apr 30, 2010 at 7:13 PM, Simon Spero s...@unc.edu wrote: I like chocolate milk.
Re: [CODE4LIB] MODS and DCTERMS
Out of curiosity, what is your use case for turning this into DC? That might help those of us that are struggling to figure out where to start with trying to help you with an answer. -Ross. On Mon, May 3, 2010 at 11:46 AM, MJ Suhonos m...@suhonos.ca wrote: Thanks for your comments, guys. I was beginning to think the lack of response indicated that I'd asked something either heretical or painfully obvious. :-) That's my understanding as well. oai_dc predates the defining of the 15 legacy DC properties in the dcterms namespace, and it's my guess nobody saw a reason to update the oai_dc definition after this happened. This is at least part of my use case — we do a lot of work with OAI on both ends, and oai_dc is pretty limited due to the original 15 elements. My thinking at this point is that there's no reason we couldn't define something like oai_dcterms and use the full QDC set based on the updated profile. Right? FWIW, I'm not limited to any legacy ties; in fact, my project is aimed at pushing the newer, DC-sanctioned ideas forward, so I suspect in my case using an XML serialization that validates against http://purl.org/dc/terms/ is probably sufficient (whether that's RDF or not doesn't matter at this point). So, back to the other part of the question: has anybody seen a MODS — DCTERMS crosswalk in the wild? It looks like there's a lot of similarity between the two, but before I go too deep down that rabbit hole, I'd like to make sure someone else hasn't already experienced that, erm, joy. MJ
Re: [CODE4LIB] it's cool to hate on OpenURL
On Fri, Apr 30, 2010 at 4:09 AM, Jakob Voss jakob.v...@gbv.de wrote: Am I right that neither OpenURL nor COinS strictly defines a metadata model with a set of entities/attributes/fields/you-name-it and their definition? Apparently all ContextObjects metadata formats are based on non-normative implementation guidelines only ?? You are right. Z39.88 and (by extension) COinS really only defines the ContextObject itself. So it defines the carrier package, it's administrative elements, referents, referrers, referringentities, services, requester and resolver and their transports. It doesn't really specify what should actually go into any of those slots. The idea is that it defers to the community profiles for that. In the XML context object, you can send more than one metadata-by-val element (or metadata-by-ref) per entity (ref, rfr, rfe, svc, req, res) - I'm not sure what is supposed to happen, for example, if you send a referent that has multiple MBV elements that don't actually describe the same thing. -Ross.
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
On Fri, Apr 30, 2010 at 7:59 AM, Kyle Banerjee kyle.baner...@gmail.com wrote: An obvious thing for a resolver to be able to do is return results in JSON so the OpenURL can be more than a static link. But since the standard defines no such response, the site generating the OpenURL would have to know something about the resolver. I actually think this lack of any specified response format is a large factor in the stagnation of OpenURL as a technology. Since a resolver is under no obligation to do anything but present a web page it's difficult for local entrepreneurial types to build upon the infrastructure simply because there are no guarantees that it will work anywhere else (or even locally, depending on your vendor, I suppose), much less contribute back to the ecosystem. Umlaut was able to exist because (for better or worse) SFX has an XML output. It has never been able to scale horizontally, however, because to work with another vendor's link resolver (which should actually be quite straightforward) it requires a connector to whatever *their* proprietary API needs. I could definitely see a project like Umlaut providing a 'de facto' machine readable response for SAP 1/2 requests that content providers could then use to start offering better integration at *their* end. This assumes that more than 5 libraries would actually be using it, however. -Ross.
Re: [CODE4LIB] Twitter annotations and library software
On Thu, Apr 29, 2010 at 8:17 AM, MJ Suhonos m...@suhonos.ca wrote: Okay, I know it's cool to hate on OpenURL, but I feel I have to clarify a few points: It's not that it's cool to hate on OpenURL, but if you've really worked with it it's easy to grow bitter. snip Maybe if I put it that way, OpenURL sounds a little less crappy. No, OpenURL is still crappy and it will always be crappy, I'm afraid, because it's tremendously complicated, mainly from the fact that it tries to do too much. The reason that context-sensitive services based on bibliographic citations comprise 99% of all OpenURL activity is because: A) that was the problem it was originally designed to solve B) it's the only thing it really does well (and OpenURL 1.0's insistence on being able to solve any problem almost takes that strength away from it) The barriers to entry + the complexity of implementation almost guarantee that there's a better or, at any rate, easier alternative to any problem. The difference between OpenURL and DublinCore is that the RDF community picked up on DC because it was simple and did exactly what they needed (and nothing more). A better analogy would be Z39.50 or SRU: two non-library-specific protocols that, for their own reasons, haven't seen much uptake outside of the library community. -Ross.
Re: [CODE4LIB] Twitter annotations and library software
On Thu, Apr 29, 2010 at 10:32 AM, Rosalyn Metz rosalynm...@gmail.com wrote: I'm going to throw in my two cents. I dont think (and correct me if i'm wrong) we have mentioned once what a user might actually put in a twitter annotation. a book title? an article title? a link? I think the idea is these would be machine generated from an application. So, imagine LT, Amazon, Delicious Library or SFX having a Tweet this! button and *that* provides the annotation (not the user). i think creating some super complicated thing for a twitter annotation dooms it to failure. after all, its twitter...make it short and sweet. Indeed, it's limited. also the 1.0 document for OpenURL isn't really that bad (yes I have read it). a good portion of it is a chart with the different metadata elements. also open url could conceivably refer to an animal and then link to a bunch of resources on that animal, but no one has done that. i don't think that's a problem with OpenURL i think thats a problem with the metadata sent by vendors to link resolvers and librarians lack of creativity (yes i did make a ridiculous generalization that was not intended to offend anyone but inevitably it will). having been a vendor who has worked with openurl, i know that the informaiton databases send seriously effects (affects?) what you can actually do in a link resolver. No, this is the mythical promise of 1.0, but delivery is, frankly, much more complicated than that. It is impractical to expect an OpenURL link resolver to make sense of any old thing you throw at it and return sensible results. This is the point of the community profiles, to narrow the infinite possibilities a bit. None of our current profiles would support the scenario you speak of and I would be surprised if such a service were to be devised, that it would be built on OpenURL. I think it's very easy to underestimate how complicated it is to actually build something using OpenURL since in the abstract it seems like a very logical solution to any problem. -Ross. On Thu, Apr 29, 2010 at 10:23 AM, Tim Spalding t...@librarything.com wrote: Can we just hold a vote or something? I'm happy to do whatever the community here wants and will actually use. I want to do something that will be usable by others. I also favor something dead simple, so it will be implemented. If we don't reach some sort of conclusion, this is an interesting waste of time. I propose only people engaged in doing something along these lines get to vote? Tim
Re: [CODE4LIB] Twitter annotations and library software
On Thu, Apr 29, 2010 at 11:21 AM, Jonathan Rochkind rochk...@jhu.edu wrote: (Last time I looked at Bibo, I recall there was no place to put a standard identifier like a DOI. So maybe using Bibo + URI for standard identifier would suffice. etc.) BIBO has all sorts of identifiers (including DOI): http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/dataproperties/doi___1125128004.html As well as ISBN (10 and 13), ISSN/e-issn, LCCN, EAN, OCLCNUM, and more. -Ross.
Re: [CODE4LIB] Twitter annotations and library software
I still don't really see how what you're talking about would practically be accomplished. For one, to have rft.subject, like you mention, would require using the dublincore context set. Since that wouldn't be useful on its own for the services that link resolvers currently offer, OpenURL sources (i.e. AI database providers) would have to support SAP 2 (XML) context objects so they can pass the book/journal/patent/etc. referent metadata along with the Dublin Core referent metadata. It also becomes a POST rather than a simple link (GET). What I'm saying is it ups the requirements on all ends of the ecosystem, for what? What you're talking about would be *much* more easily implemented via SRU and CQL (or OpenSearch), anyway, since your example is really performing a search. Since OpenURL doesn't have any semblance of standardized response format, a client wouldn't know what to do with the response, anyway. -Ross. On Thu, Apr 29, 2010 at 11:29 AM, Rosalyn Metz rosalynm...@gmail.com wrote: ok right now exlibris has a recommender service for sfx that stores metadata from an openurl. lets say a vendor bothered to pass an element like rft.subject=hippo (which is most likely unlikely to happen since they can't even pass an issn half the time). that subject got stored in the recommender service. next time a child saw something in ebsco animals about hippos they could click the find this button (or whatever it says) and the recommender service could bring up everything on hippos. the openurl that would be passed would be something like http://your.linkresolver.com/name?rft.subject=hippo yes this is simplistic, but its more creative then say doing something boring like just bringing up the full text or doing something half ass creative like bringing up articles that are cited in the footnotes. and to say something like rft.subject (or whatever it might be called) is out of the scope of group profiles is a little absurd since we are talking about things that already have subjects attached to them (see any database or other library related system). of course you'll probably want to talk about next how subjects aren't standardized and that makes it possible. that is true, but that isn't openurl's fault or the link resolvers fault, thats the database vendors who refuse to get with the program. On Thu, Apr 29, 2010 at 11:02 AM, Ross Singer rossfsin...@gmail.com wrote: On Thu, Apr 29, 2010 at 10:32 AM, Rosalyn Metz rosalynm...@gmail.com wrote: I'm going to throw in my two cents. I dont think (and correct me if i'm wrong) we have mentioned once what a user might actually put in a twitter annotation. a book title? an article title? a link? I think the idea is these would be machine generated from an application. So, imagine LT, Amazon, Delicious Library or SFX having a Tweet this! button and *that* provides the annotation (not the user). i think creating some super complicated thing for a twitter annotation dooms it to failure. after all, its twitter...make it short and sweet. Indeed, it's limited. also the 1.0 document for OpenURL isn't really that bad (yes I have read it). a good portion of it is a chart with the different metadata elements. also open url could conceivably refer to an animal and then link to a bunch of resources on that animal, but no one has done that. i don't think that's a problem with OpenURL i think thats a problem with the metadata sent by vendors to link resolvers and librarians lack of creativity (yes i did make a ridiculous generalization that was not intended to offend anyone but inevitably it will). having been a vendor who has worked with openurl, i know that the informaiton databases send seriously effects (affects?) what you can actually do in a link resolver. No, this is the mythical promise of 1.0, but delivery is, frankly, much more complicated than that. It is impractical to expect an OpenURL link resolver to make sense of any old thing you throw at it and return sensible results. This is the point of the community profiles, to narrow the infinite possibilities a bit. None of our current profiles would support the scenario you speak of and I would be surprised if such a service were to be devised, that it would be built on OpenURL. I think it's very easy to underestimate how complicated it is to actually build something using OpenURL since in the abstract it seems like a very logical solution to any problem. -Ross. On Thu, Apr 29, 2010 at 10:23 AM, Tim Spalding t...@librarything.com wrote: Can we just hold a vote or something? I'm happy to do whatever the community here wants and will actually use. I want to do something that will be usable by others. I also favor something dead simple, so it will be implemented. If we don't reach some sort of conclusion, this is an interesting waste of time. I propose only people engaged in doing something along these lines get to vote? Tim
Re: [CODE4LIB] Microsoft Zentity
On Wed, Apr 28, 2010 at 10:21 AM, Houghton,Andrew hough...@oclc.org wrote: If its open source, I assume that it could be adapted to run under Mono and then you could run it on Linux, Macs, etc. It may even run under Mono, don't know, haven't played with it. Well, it requires SQLServer, so I think this is probably going to be much more difficult than it's worth. -Ross.
Re: [CODE4LIB] Microsoft Zentity
On Wed, Apr 28, 2010 at 10:17 AM, Ethan Gruber ewg4x...@gmail.com wrote: It seems to me that the major flaw of the software is that it isn't cross-platform, which comes as no surprise. But I feel Microsoft didn't do their market research. While the financial and business sectors are heavily reliant on Microsoft servers, American universities, and by extension, research libraries, are not. If they really wanted to make a commitment to support the academic community as they say on the Zentity website, they would have developed it for a platform that the academic community actually uses. This seems like sort of a snotty answer, honestly, and I find three flaws with it: 1) Research and intellectual output is not exclusive to large, research university which means repositories should not be exclusive to ARL libraries 2) There are lots of academic Microsoft shops, esp. at the campus IT (or departmental IT) level. It's not beyond reason to think that a smaller university would prefer the repository be hosted by central IT (or that the chemistry department or engineering school in a larger university host their own repository). 3) E-Prints, for example, seems to be making an effort to commodotize and democratize the repository space a bit by making it as simple as possible to run an IR. MS is making this even simpler for places that already have Windows servers (which is a lot). There are plenty of reasons to criticize Microsoft, but I just don't see how Zentity is one of them. -Ross.
Re: [CODE4LIB] Twitter annotations and library software
On Tue, Apr 27, 2010 at 7:02 AM, Jakob Voss jakob.v...@gbv.de wrote: The purpose of description can best be served by a format that can easily be displayed for human beeings. You can either use a simple string or a well-known format. A string can be displayed but people will put all different citation formats in there. Right now there are only two established metadata formats that aim at creating a citation: a) BibTeX b) The input format of the Citation Style Language (CSL) This isn't entirely true. There's RIS (http://en.wikipedia.org/wiki/RIS_%28file_format%29) and BIBO (http://bibliontology.com/) is starting to become quite common in the linked data sphere. There's also BibJSON (http://www.bibkn.org/bibjson/index.html) which I've had a browser tab open for months with the intention of actually looking at and actually seems quite well suited for how Twitter will store annotations. My opinion of it all along, however, has been very similar to yours -- why another citation format and why bind it so closely to a particular serialization? -Ross.
Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
The advantage of the NoSQL DBs is that they're schema-less which allows much more flexibility in your data going in. However, it sounds like your schema may be pretty standardized -- I'm not sure of a huge advantage (outside the aforementioned replication functionality) you'd get. -Ross. On Mon, Apr 12, 2010 at 10:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote: So let's say (hypothetically, of course) that a colleague tells you he's considering a NoSQL database like MongoDB or CouchDB, to store a couple tens of millions of documents, where a document is pretty much an article citation, abstract, and the location of full text (not the full text itself). Would your reaction be: That's a sensible, forward-looking approach. Lots of sites are putting lots of data into these databases and they'll only get better. This guy's on the bleeding edge. Personally, I'd hold off, but it could work. Schedule that 2012 re-migration to Oracle or Postgres now. Bwahahahah!!! Or something else? (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.) -- Thomas Dowling tdowl...@ohiolink.edu
Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
On Mon, Apr 12, 2010 at 12:22 PM, Jonathan Rochkind rochk...@jhu.edu wrote: The thing is, the NoSQL stuff is pretty much just a key-value store. There's generally no way to query the store, instead you can simply look up a document by ID. Actually, this depends largely on the NoSQL DBMS in question. Some are key value stores (Redis, Tokyo Cabinet, Cassandra), some are document-based (CouchDB, MongoDB), some are graph-based (Neo4J), so I think blanket statements like this are somewhat misleading. CouchDB and MongoDB (for example) have the capacity to index the values within the document - you don't just have to look up things by document ID. -Ross.
Re: [CODE4LIB] OpenURL aggregator not doing so well
Yes, although, the problem is actually with Connotea: http://www.connotea.org/article/4c40adbf8ecaef53b3772b5a141e229d So we either need to talk to NPG or drop Connotea from the OpenURL planet. -Ross. On Fri, Apr 9, 2010 at 8:00 AM, Eric Hellman e...@hellman.net wrote: Take a look at http://openurl.code4lib.org/aggregator Any ideas how to make it work better? Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/
Re: [CODE4LIB] local c4l chatter and the listserv
Or if their regional mailing list could send the main list a digest email. Or something. -Ross. On Fri, Apr 9, 2010 at 6:30 PM, Frumkin, Jeremy frumk...@u.library.arizona.edu wrote: And by 'end case' of course I meant 'edge case'. -- jaf On Apr 9, 2010, at 3:26 PM, Frumkin, Jeremy wrote: Seems a bit complex to me. I'd be happy if people just remembered to announce things on the main list, such as we're holding this here event, and/or if you are interested in this event, sign up on this related discussion list. I'm not a big fan of architecting to an end case, and it feels like that's what this is. -- jaf On Apr 9, 2010, at 3:08 PM, Jonathan Rochkind wrote: We are stuck between two problems, with some people thinking only one of these is/would be a problem, and others not caring at all either way: * Local conference/meetup planning chatter overwhelms the listserv when it's on the main listserv * People don't find out about local conferences/meetups they are interested in when local chatter is somewhere else. My first thought is, gee, this really calls for some kind of threaded forum software, where people can subscribe to only the threads they want. But then I remember, a) that kind of software always sucks, and b) there must be a better web 2.0y way to do it. Just as hypothetical brainstorming, what if we did this: 1. Local code4lib groups are required (ie, strongly strongly strongly encouraged, we can't really require anyone to do anything) to, if they have a local listserv at all, have it listserv in some place that: a) Has _publically viewable archives_ b) Has an RSS-or-Atom feed of those archives, which requires no authentication to subscribe to [Google groups is one very easy way to get both those things, but certainly not the only one] 2. All those local listservs are listed on a wiki page, which local groups are required to add their listserv to. 3. We set up a planet aggregator of all those listserv's RSS. 4. Profit! That is, now: * People can sign up for an individual listserv they want, if they want. * People can view the up-to-date 'archives' of an individual listserv on the web if they want; * people can view the up-to-date 'archives' of the _aggregated_ C4L Local communication, via the aggregator. Using one of many free on the web RSS-to-email services, people can sign up for an email subscription for the AGGREGATED C4L Local traffic, getting what some want to get with just one more subscription. That last part about the RSS-to-email thing is important for our 'requirements', but is the kind of sketchiest. Potentially better is if we write our OWN RSS-to-email service (maybe that will only allow subscriptions to the C4L Aggregator or one of it's components), which we know will work okay, and which does some clever mail header munging so hitting reply to all on an email you get from the aggregator rss-to-email will send your message to the original listserv, so you really can treat your aggregator subscription just like a listserv if you want. Just brainstorming here. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Gabriel Farrell [gsf...@gmail.com] Sent: Friday, April 09, 2010 4:47 PM To: CODE4LIB@LISTSERV.ND.EDUmailto:CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib North planning continues I'm hoping to attend the upcoming code4libnorth meeting because I heart Canada, but I'd rather not join yet another mailing list. If it gets canceled or something tell us on this list or put it on the wiki page, please? On Thu, Apr 8, 2010 at 11:46 AM, Walker, David dwal...@calstate.edumailto:dwal...@calstate.edu wrote: I'm not on that conference list, so don't really know how much traffic it gets. But it seems to me that, since these regional conferences are mostly being held at different times of the year from the main conference, the overlap would be minimal. Or not. I don't know. --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.eduhttp://xerxes.calstate.edu/ From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of William Denton [...@pobox.com] Sent: Thursday, April 08, 2010 7:45 AM To: CODE4LIB@LISTSERV.ND.EDUmailto:CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib North planning continues On 8 April 2010, Walker, David quoted: I think a good compromise is to have local meeting conversations on the code4libcon google group. That list is for organizing the main conference, with details about getting rooms, food, shuttle buses, hotel booking agents, who can MC Thursday afternoon, etc. Mixing that with organizational details *and* general discussion about all local chapter meetings would confuse everything, I think. Bill -- William
Re: [CODE4LIB] newbie
sexy groovy - 43,200 On Thu, Mar 25, 2010 at 10:36 PM, Andrew Hankinson andrew.hankin...@gmail.com wrote: Just out of curiosity I tried them in quotes: sexy ruby - 72,200 sexy python - 37,900 sexy php - 25,100 sexy java - 16,100 sexy asp - 14,800 sexy perl - 8,080 sexy C++ - 177 sexy FORTRAN - 67 sexy COBOL - 8 I tried sexy lisp but the results were skewed by speech impediment fetishes. Which I'd say is even less strange than 8 people thinking you can write sexy COBOL. On 2010-03-25, at 10:20 PM, Tim Spalding wrote: Finally, I never would have put the strings PHP and sexiness in a sentence together (though I guess I just did). A simple Google search shows how very wrong you are: sexy php - 56,100,000 results sexy asp - 8,380,000 sexy java - 6,360,000 sexy ruby - 2,840,000 sexy perl - 532,000 sexy C++ - 488,000 sexy smalltalk - 113,000 sexy fortran - 107,000 sexy COBOL - 58,100 There are also very high results for sexy logo. Perhaps, since I was in fourth grade, someone's figured out something interesting to do with that stupid turtle! Tim
Re: [CODE4LIB] PHP bashing (was: newbie)
On Fri, Mar 26, 2010 at 10:22 AM, Mike Taylor m...@indexdata.com wrote: For someone who is just starting out in programming, I think the very last thing you want is a verbose language that makes you spend half your time talking about types that you don't really care about. I'm not saying there isn't a time and a place for static type-checking, but while learning to program isn't it. +1 I couldn't agree more. To all points. And now that we know your language of choice, we are anxiously awaiting your MARC-8 support patch to ruby-marc, Mike. -Ross.
Re: [CODE4LIB] newbie
On Thu, Mar 25, 2010 at 12:29 PM, Aaron Rubinstein arubi...@library.umass.edu wrote: This is some of the best advice. Reading and adapting good code has been my favorite way to learn. There was a discussion a couple years back on a code4lib code repository of some kind[1]. I'd love to resurrect this idea. A private pastebin[2] might be a decent option. I also know that a number of us use GitHub[3], which allows for collecting syntax highlighted code snippets and has some nifty social networking features that let you follow other coders and projects. GitHub is certainly not a solution for a code4lib repository but is another way to share code and learn from each other. I disagreed with this back in the day, and I still disagree with running our own code repository. There are too many good code hosting solutions out there for this to be justifiable. We used to run an SVN repo at code4lib.org, but we never bothered rebuilding it after our server got hacked. Actually I think GitHub/Google Code and their ilk are a much better solution -- especially for pastebins/gists/etc. What would be useful, though, is an aggregation of the Code4lib's community spread across these sites, sort of what like the Planet does for blog postings, etc. or what Google Buzz does for the people I follow (i.e. I see their gists). I'd buy in to that (and help support it), but I'm not sure how one would go about it. -Ross.
Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas
On Mon, Mar 22, 2010 at 1:09 PM, Karen Coyle li...@kcoyle.net wrote: the records... It might wok, I really want to try to model this. Wish we could get some folks together for a 1/2 day somewhere and JUST DO IT. +1 to this. Maybe a whole day or two, though. I totally agree we're past the point of hand waviness and just need to model this stuff /pragmatically/ (i.e. in a manner we think we could actually use), at scale, and have something to point to. And then release whatever comes out of it so other can do the same thing. Honestly, I believe we're at a stage of librarian-exhaustion over RDA and FRBR that the first decent working example of this, however removed from the actual specs, will become the defacto standard. -Ross.
Re: [CODE4LIB] Any examples of using OAI-ORE for aggregation?
Joe, I'm not sure if this conforms to what you're talking about, but have you seen the Library of Congress' OAI-ORE implementation for Chronicling America? http://chroniclingamerica.loc.gov/ http://chroniclingamerica.loc.gov/lccn/sn83030214.rdf -Ross. On Wed, Mar 10, 2010 at 1:44 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: Most of the examples I've seen of OAI-ORE seem to assume that you're ultimately interested in only one object within the resource map -- effectively, it's content negotiation. Has anyone ever played with using ORE to point at an aggregation, with the expectation that the user will be interested in all parts, and automatically download them? ... Let me give a concrete example: A user searches for some data ... we find (x) number of records that match their criteria, and they then weed the list down to 10 files of interest. We then save this request as a Resource Map, as part of an OAIS order. I then want to be able to hand this off to a browser / downloader / whatever to try to obtain the individual files. Currently, I have something that can take the request, and create a tarball on the fly, but we have the unfortunate situation when some of the data is near-line and/or has to be regenerated -- I'm trying to find a good way to effectively fork the request into multiple smaller request, some of which I can service now, and some for which I can return an HTTP 503 status (service unavailable) w/ a retry-after header. ... Has anyone ever tried doing something like this? Should I even be looking at ORE, or is there something that better fits with what I'm trying to do? Thanks for any advice / insight you can give -Joe - Joe Hourcle Programmer/Analyst Solar Data Analysis Center
Re: [CODE4LIB] Vote for Code4Lib 2011 host is OPEN
Polls close midnight EDT March 23. May the best city win, -Ross. On Fri, Mar 12, 2010 at 5:37 PM, Michael J. Giarlo leftw...@alumni.rutgers.edu wrote: Folks, We received three excellent proposals for hosting the 2011 conference, and now it is time to vote on them! Voting is open for a week. (Actually, I don't know the close date/time but we should have a week or so to vote. Ross?) How to vote: 1. Go here: http://vote.code4lib.org/election/index/15 2. Log in using your code4lib.org credentials (register at code4lib.org if you haven't done so already). If you have trouble authenticating, contact myself and Ryan Wick (ryanwick at gmail). 3. Click on a host's name to reveal a link to the full proposal 4. Assign each proposal a rank from 0 to 3, 0 being least desirable and 3 being the most. Please keep the conference requirements and desirables in mind as you make your selection: http://code4lib.org/conference/hosting 5. Once you are satisfied with your rankings, click Cast your ballot. 6. Want to change your rankings? You can! As often as you'd like, even, up until the vote closes. Feel free to watch http://vote.code4lib.org/election/results/15 for returns, or hop into irc://irc.freenode.net/code4lib and type @hosts2011. Thanks to Ross Works Hard For The Money Singer for setting the vote up, as always! -Mike
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I certainly would be will to work with LC on creating a MARC-JSON specification as I did in creating the MARC-XML specification. Quite frankly, I think I (and I imagine others) would much rather see a more open, RFC-style process to creating a marc-json spec than I talked to LC and here you go. Maybe I'm misreading this last paragraph a bit, however. -Ross.
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Young byo...@bigbluehat.com wrote: A CouchDB friend of mine just pointed me to the BibJSON format by the Bibliographic Knowledge Network: http://www.bibkn.org/bibjson/index.html Might be worth looking through for future collaboration/transformation options. marc-json and BibJSON serve two different purposes: marc-json would need to be a loss-less serialization of a MARC record which may or may not contain bibliographic data (it may be an authority, holding or CID record, for example). BibJSON is more of a merging of data model and serialization (which, admittedly, is no stranger to MARC) for the purpose of bibliographic /citations/. So it will probably be lossy and there would most likely be a lot of MARC data that is out of scope. That's not to say it wouldn't be useful to figure out how to get from MARC-BibJSON, but from my perspective it's difficult to see the advantage it brings (being tied to JSON) vs. BIBO. -Ross.
Re: [CODE4LIB] Code4Lib 2011 Proposals
The date is not etched in stone. -Ross. On Wed, Mar 3, 2010 at 9:35 AM, Ethan Gruber ewg4x...@gmail.com wrote: Ithaca in February sounds kind of depressing, honestly. On Wed, Mar 3, 2010 at 9:27 AM, Ma, Hong h...@miami.edu wrote: Agree with Carol. Austin is good. Thanks, Hong Hong Ma Information Systems Librarian Otto G. Richter Library University of Miami 1300 Memorial Dr., Rm.301-A Coral Gables, FL 33124 h...@miami.edu (305) 284-8844 -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Carol Bean Sent: Wednesday, March 03, 2010 9:06 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals Snowy northern climes-- Carol (still hoping for a bid from Austin) From: Kevin S. Clarke kscla...@gmail.com To: CODE4LIB@LISTSERV.ND.EDU Date: 03/03/2010 09:00 AM Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals Sent by: Code for Libraries CODE4LIB@LISTSERV.ND.EDU On Wed, Mar 3, 2010 at 6:35 AM, John Fereira ja...@cornell.edu wrote: I've got a bit of conference planning burnout after being on the planning commitee for the Jasig conference for the sixth time in a row but I'm inclined to throw out Ithaca, NY as a possible location for 2011. ooh, +1 ... I was born in Ithaca, but haven't been back since; I'd love an excuse to visit and explore! From what I hear, it would make a nice venue for c4l11. Kevin
Re: [CODE4LIB] Code4Lib 2011 Proposals
On Wed, Mar 3, 2010 at 9:55 AM, Paul Joseph pjjos...@gmail.com wrote: No need to be concerned about the vendors: they're the same suspects who sponsored C4L10. Just to be clear on this -- the same suspects actually shelled out far less for C4L10 than they had in the past. And we had far fewer sponsors than we had in, say, Portland (which required similar economic gymnastics, in a much stronger economy, to keep it affordable). -Ross.
Re: [CODE4LIB] Asheville Brews Cruise details payment info
HOW MANY PIZZAS CAN I PUT YOU DOWN FOR? Love, -Talis. Hi all, It is time to reveal the details about the Brews Cruise social activity planned for next Tuesday night at the Code4Lib 2010 conference [1]. Let's keep list noise to a minimum, so folks who have questions about the details, please e-mail me directly or, if it's discussion worthy, stick to the code4libcon list. First off, the event is full. Sorry if you missed the cut. We were forced to set a limit of 48 persons due to that's the max number of folks that will fit into two party buses, plus we don't want to overwhelm the staffs at the breweries. There is, however, is a waitlist that someone started on the sign-up page [2]. Secondly, I want to thank Talis [3] for stepping up and sponsoring a portion of this event. Our first stop on the cruise will be a brewery slash pizza joint and Talis has generously offered to pay for our pizza. Yay! Cost Payment options: The cost for the cruise is $40 per person. You have two options for paying: 1) Pay in advance by sending me $40 via PayPal. 2) Bring $40 with you on the night of the cruise. I've been told they have a hand-held credit card machine for the cash-strapped. Anyone who wants to can pay via PayPal, but I need at least 16 people to choose this option because the tour company wants to pre-bill my credit card for a minimum of 16 guests. There should be no fees involved if the money comes from your PayPal account or an associated bank account. The deadline for paying in advance is EOD Sunday, February 21st. If you wish to prepay via Paypal--you know you want to--here are the instructions: 1) Go to http://paypal.com 2) Click on Send Money 3) Enter lb...@reallywow.com in the To field 4) Enter your own address in the From field (unless you're logged in) 5) Click the Personal tab and choose Payment owed from the options 6) Click Continue 7) On the next page you can specify a message Subject of Brews Cruise Itinerary: - Pickup from the hotel is tentatively scheduled for 6:15pm. Those who haven't pre-paid should try to get there a little early. - Stop #1 will be the Asheville Pizza Brewing Co. where we will sample 16-20 different beers and consume our delicious, alcohol-absorbing, Talis-sponsored pizza. - Stop #2 will be Highland Brewing Company, Asheville's 1st and largest brewing company - Stop #3 will be the French Broad Brewery which specializes in a variety of European style beers. - Expected return to the hotel is around 9:30-10pm Thanks for signing up! I think it's going to be a great time! --jay PS, did I mention Talis is paying for the pizza! Yay, Talis! PPS, Talis employee, Ross Singer, will be attending the event. Be sure to ask him about Platforms. [1] http://wiki.code4lib.org/index.php/C4L2010_social_activities#Asheville_Brews_Cruise [2] http://wiki.code4lib.org/index.php/C4L2010_social_activities#Wait_List [3] http://www.talis.com/
Re: [CODE4LIB] Rails Hosting
Have you looked at Heroku (http://heroku.com/)? I've only used their freebie plan (so I have no idea how they compare pricewise), but it's been fantastic to get Ruby apps running there. Dreamhost also provides Passenger to their customers (http://wiki.dreamhost.com/Passenger) so that might be an option, too. -Ross. On Thu, Jan 14, 2010 at 11:15 AM, Kevin Reiss reiss.ke...@yahoo.com wrote: Hi, I was curious if anyone could recommend a hosting service that they've had a good ruby on rails experience with. I've been working with bluehost but my experience has not been good. You need to work through a lot of hoops just to get a moderately complicated rails application properly. The applications we are looking at deploying would be moderately active, 1,000 -2000 visits a day. Thanks for any comments in advance. Regards, Kevin Reiss
Re: [CODE4LIB] Rails Hosting
I think one thing to consider between Heroku and something like Slicehost, is what exactly you have the resources/willingness to support. One of the things I've really liked is that to get an app running on Heroku is that I basically just have to worry about my Ruby app, not maintaining a server environment. On the other hand, it's somewhat limiting as to what I can do there, so it's not a solution to every problem. -Ross. On Thu, Jan 14, 2010 at 11:44 AM, Rosalyn Metz rosalynm...@gmail.com wrote: Hi Kevin, I'm going to recommend slicehost also. Again, I haven't used it but I met the (former) owner. He sold the business to rackspace, which has an awesome reputation in the cloud computing world. They are #2 behind amazon. Rosalyn On Thu, Jan 14, 2010 at 11:40 AM, Doran, Michael D do...@uta.edu wrote: Hi Kevin, Although I can't recommend any hosting based on personal experience, a while back I had bookmarked a recommended (by another code4libber) hosting site: Slicehost at http://www.slicehost.com/ I think they pretty much get out of the way and let you do what you want, development wise. Regarding Rails in particular, one of their testimonials said The only thing I can say is Wow! ... Rails up and running in 30 minutes. Another said ...I’m a Rails developer and a Linux enthusiast who can’t believe he found a Gentoo VPS with 256MB RAM for $20/month. And yet another ...I’m a freelance Rails developer, and my experience on an Ubuntu VPS has been fantastic compared to my previous shared hosting experience. [1] Again, this is *not* a recommendation from personal experience. -- Michael [1] http://www.slicehost.com/why-slicehost/testimonials # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Kevin Reiss Sent: Thursday, January 14, 2010 10:16 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Rails Hosting Hi, I was curious if anyone could recommend a hosting service that they've had a good ruby on rails experience with. I've been working with bluehost but my experience has not been good. You need to work through a lot of hoops just to get a moderately complicated rails application properly. The applications we are looking at deploying would be moderately active, 1,000 -2000 visits a day. Thanks for any comments in advance. Regards, Kevin Reiss
Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?
I definitely agree with Bill here. There is a definitely a totemistic attitude about vim or emacs being all the IDE I need. Knowing your way around vim (or possibly emacs) is certainly important -- after all, everybody has to eventually fix something remotely -- but just languages, some editors look or feel better. My basic credo is that I want to find the absolute least resistance between what I see in my head and what eventually gets run. This applies to my office chair, my keyboard, my monitor, operating system, editor, language, SCM, deployment manager, etc. Every layer provides amperage that must be accounted for. Because in the end, I'm spending 8+ hours a day, 5+ days a week looking at and working on this setup; I might as well be comfortable. Personally, I use TextMate for practically all the code I write. I don't actually use all the features that generally draw people to TextMate (SCM integration, macros for automating certain tasks in particular languages/frameworks, etc.) at all. I just like the way it looks, it's relatively lean and I can easily cut and paste (which is my major knock on the character-based editors). I have a mouse, dammit, so let me use it. I also use NetBeans, sometimes, although, honestly, it's only when I need to run SQL queries against JDBC databases anymore. If I was a real developer (meaning I wrote code intended to be compiled, etc.), I couldn't imagine not using something like NetBeans or Eclipse to automate some of the tedium. -Ross. On Wed, Jan 6, 2010 at 9:23 AM, Bill Dueber b...@dueber.com wrote: On Wed, Jan 6, 2010 at 8:53 AM, Joel Marchesoni jma...@email.wcu.eduwrote: I agree with Dan's last point about avoiding using a special IDE to develop with a language. I'll respectfully, but vehemently, disagree. I would say avoid *forcing* everyone working on the project depend on a special IDE -- avoid lockin. Don't avoid use. There's a spectrum of how much an editor/environment can know about a program. At one end is Smalltalk, where the development environment *is* the program. At the other end is something like LISP (and, to an extent, Ruby) where so little can be inferred from the syntax of the code that a smart IDE can't actually know much other than how to match parentheses. For languages where little can be known at compile time, an IDE may not buy you very much other than syntax highlighting and code folding. For Java, C++, etc. an IDE can know damn near everything about your project and radically up your productivity -- variable renaming, refactoring, context-sensitive help, jump-to-definition, method-name completion, etc. It really is a difference that makes a difference. I know folks say they can get the same thing from vim or emacs, but at that level those editors are no less complex (and a good deal more opaque) than something like Eclipse or Netbeans unless you already have a decade of experience with them. If you're starting in a new language, try a couple editors, too. Both Eclipse and Netbeans are free and cross-platform, and have support for a lot of languages. Editors like Notepad++, EditPlus, Textmate jEdit, and BBEdit can all do very nice things with a variety of languages. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Online PHP course?
Seems to me that Dan's Hacker 101/201 preconfs fall into this sort of category. I think it would be really useful to see at a conference that didn't already appeal to the hacker set, like CiL or LITA or something. Even Access. -Ross. On Wed, Jan 6, 2010 at 2:20 PM, Tim Spalding t...@librarything.com wrote: I wonder if Code4Lib would ever be a good outlet for online programming tutorials or hack sessions. I mean, get 10 people on Etherpad or CodeArmy together, and Skype, and you could learn a lot, and do a lot. Tim
Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?
I realize you didn't want to start a religious war nor were you interested in the abstract reasons people chose a particular language, that being said... I honestly think choosing the best* development language is very similar to how one settles on politics, religion, diet, etc. Environment plays a part, of course, but, in the end, what generally works best is the language that jibes best with you and your personality. Since you've dabbled with several different languages, you've had to have come across this - some languages just feel better than others. This is, however, an entirely personal choice. Dan Chudnov, for example, seems to think in Python. When I tried Python, it never really clicked -- I muddled through a few projects but never really got it. I then got introduced to Ruby, everything made sense, and I never looked back. I recently did a project in Groovy/Grails and my takeaway was that it was a scripting language that only somebody that had spent their career as a Java developer could love. My coworker (who has spent his career as a Java developer) LOVES Groovy. He thinks Ruby is a Fisher-Price language. To each their own. Since you don't seem to have institutional constraints on what you can develop in, I would recommend you try something like this: Take a handful of languages that look interesting to you and try writing a simple app to take some of your data, model it and shove it into Solr and make an interface to look at it. Solr's pretty perfect for this sort of project: it's super simple to work with and immediately gives you something powerful and versatile to wrap your app around. If you can't make something useful quickly around Solr, then move on to the next language because that one's not for you. If the ones that click happen to be PHP, Python or Ruby, well, there you go. If not, I, for one, look forward to your new Lua (or whatever) based discovery interface. Ultimately, any project you choose for your discovery interface is going to require a lot of customization to make it work the way you want -- the key is finding the environment that stands the least in the way between turning what's in your head into a working app. Good luck, -Ross. On Tue, Jan 5, 2010 at 6:04 PM, marijane white marijane.wh...@gmail.com wrote: Greetings Code4Lib, Long time lurker, first time poster here. I've been turning over this question in my mind for a few weeks now, and Joe Hourcle's postscript in the Online PHP Course thread has prompted me to finally try to ask it. =) I'm interested in hearing how the members of this list have gone about choosing development platforms for their library coding projects and/or existing open source projects (ie like VuFind vs Blacklight). For example, did you choose a language you already were familiar with? One you wanted to learn more about? Does your workplace have a standard enterprise architecture/platform that you are required to use? If you have chosen to implement an existing open source project, did you choose based on the development platform or project maturity and features or something else? Some background -- thanks to my undergraduate computer engineering studies, I have a pretty solid understanding of programming fundamentals, but most of my pre-LIS work experience was in software testing and did not require me to employ much of what I learned programming-wise, so I've mostly dabbled over the last decade or so. I've got a bit of experience with a bunch of languages and I'm not married to any of them. I also kind of like having excuses to learn new ones. My situation is this: I would like to eventually implement a discovery tool at MPOW, but I am having a hell of a time choosing one. I'm a solo librarian on a content team at a software and information services company, so I'm not really tied to the platforms used by the software engineering teams here. I know a bit of Ruby, so I've played with Blacklight some, got it to install on Windows and managed to import a really rough Solr index. I'm more attracted to the features in VuFind, but I don't know much PHP yet and I haven't gotten it installed successfully yet. My collection's metadata is not in an ILS (yet) and not in MARC, so I've also considered trying out more generic approaches like ajax-solr (though I don't know a lot of javascript yet, either). I've also given a cursory look at SOPAC and Scriblio. My options are wide open, and I'm having a rough time deciding what direction to go in. I guess it's kind of similar to someone who is new to programming and attempting to choose their first language to learn. I will attempt to head off a programming language religious war =) by stating that I'm not really interested in the virtues of one platform over another, moreso the abstract reasons one might have for selecting one. Have any of you ever been in a similar situation? How'd you get yourself unstuck? If you haven't, what do
Re: [CODE4LIB] resource sharing/ill oss
Has anybody followed up on Relais announcement that their products will be open sourced: http://www.relais-intl.com/relais/home/Relais%20Products%20Go%20Open%20Source%20-%20Press%20Release.pdf ? OpenILL (which was written by the University of Winnipeg in Cold Fusion) also seems to have disappeared. -Ross. On Mon, Jan 4, 2010 at 12:29 PM, Eric Lease Morgan eric_mor...@infomotions.com wrote: Do you know of any resource sharing/ILL open source software? Prospero seems like a likely candidate, but it also seems to have gone missing. [1] http://bones.med.ohio-state.edu/prospero/ -- Eric Lease Morgan
Re: [CODE4LIB] T-shirt Design Contest
I've asked it before, I'll ask it again. Can we add the Roy Thong(tm)? -Ross. On Mon, Jan 4, 2010 at 1:02 PM, Smith,Devon smit...@oclc.org wrote: There is a cafepress store front for code4lib. There's nothing in it at the moment. http://www.cafepress.com/code4lib Last year I suggested that all tshirts be put in the store. Then I forgot all about it. Oops, my bad. Tentative guidelines: - The contest and the store are separate. You can enter the contest and not have your shirt in the store. - Shirts will be sold at cost for now. - To get your design in the store, send it to code4libcafepr...@decasm.com. By sending it to this address, you agree to the following terms: Designs submitted to the code4lib cafepress store will be sold for amounts to be determined by the code4libcon mailing list. Any revenue generated beyond costs will be spent according to voting on that list. You retain copyright, but grant permission to the code4lib community to sell any product available on cafepress with the submitted design. These guidelines are tentative and subject to change. You agree to be cool about that. For best results, tshirts for the cafepress store should be designed with this template: http://www.cafepress.com/content/si/temp_10x10_apparel.zip /dev -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Rosalyn Metz Sent: Monday, January 04, 2010 7:23 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] T-shirt Design Contest correct me if i'm wrong, but didn't someone already set up a store. i distinctly remember there being a roy tennant thong (as do others if you google it). it appears to have gone away though... On Sun, Jan 3, 2010 at 5:51 PM, Kevin S. Clarke kscla...@gmail.com wrote: I like that idea (and the idea of it as something that exists apart from the conference budget, but perhaps funds scholarships in the following year). I think last year someone suggested putting all the t-shirt submissions in there (not just the winning one - It lets folks buy the conference shirt, but also others that might appeal to them). I think anyone in the community could register http://shop.cafepress.com/lisforge and manage it as a means to fund scholarships (or contribute in some other way - depending on the amount raised). Kevin On Sun, Jan 3, 2010 at 5:32 PM, Michael J. Giarlo leftw...@alumni.rutgers.edu wrote: We've talked before about setting up a code4lib CafePress store. Maybe we've already done it? It's an idea, at least. -Mike On Sun, Jan 3, 2010 at 16:40, Christina Salazar christinagama...@gmail.com wrote: Y'know... I think y'all should order extras and sell and ship them to those of us who cannot attend. I love my past conference t-shirts and they get some interesting reactions when I wear 'em. I'd buy any one of these designs. Seems like you might be able to make a bit of dough for scholarships and whatnot... Christina Salazar On Sun, Jan 3, 2010 at 12:23 PM, Patrick Hochstenbach patrick.hochstenb...@ugent.be wrote: Hello All, Here is the Inkscape entry designed by my lovely wife :) Greetings from Belgium, P@ Skype: patrick.hochstenbach Patrick Hochstenbach Software Architect University Library +32(0)92647980 Ghent University * Rozier 9 * 9000 * Gent
Re: [CODE4LIB] T-shirt Design Contest
I note they also have a boxer short option -- so, I'll up the ante to an entire line of Roy Tennant Undergarments -Ross. On Mon, Jan 4, 2010 at 10:31 PM, Ross Singer rossfsin...@gmail.com wrote: I've asked it before, I'll ask it again. Can we add the Roy Thong(tm)? -Ross. On Mon, Jan 4, 2010 at 1:02 PM, Smith,Devon smit...@oclc.org wrote: There is a cafepress store front for code4lib. There's nothing in it at the moment. http://www.cafepress.com/code4lib Last year I suggested that all tshirts be put in the store. Then I forgot all about it. Oops, my bad. Tentative guidelines: - The contest and the store are separate. You can enter the contest and not have your shirt in the store. - Shirts will be sold at cost for now. - To get your design in the store, send it to code4libcafepr...@decasm.com. By sending it to this address, you agree to the following terms: Designs submitted to the code4lib cafepress store will be sold for amounts to be determined by the code4libcon mailing list. Any revenue generated beyond costs will be spent according to voting on that list. You retain copyright, but grant permission to the code4lib community to sell any product available on cafepress with the submitted design. These guidelines are tentative and subject to change. You agree to be cool about that. For best results, tshirts for the cafepress store should be designed with this template: http://www.cafepress.com/content/si/temp_10x10_apparel.zip /dev -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Rosalyn Metz Sent: Monday, January 04, 2010 7:23 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] T-shirt Design Contest correct me if i'm wrong, but didn't someone already set up a store. i distinctly remember there being a roy tennant thong (as do others if you google it). it appears to have gone away though... On Sun, Jan 3, 2010 at 5:51 PM, Kevin S. Clarke kscla...@gmail.com wrote: I like that idea (and the idea of it as something that exists apart from the conference budget, but perhaps funds scholarships in the following year). I think last year someone suggested putting all the t-shirt submissions in there (not just the winning one - It lets folks buy the conference shirt, but also others that might appeal to them). I think anyone in the community could register http://shop.cafepress.com/lisforge and manage it as a means to fund scholarships (or contribute in some other way - depending on the amount raised). Kevin On Sun, Jan 3, 2010 at 5:32 PM, Michael J. Giarlo leftw...@alumni.rutgers.edu wrote: We've talked before about setting up a code4lib CafePress store. Maybe we've already done it? It's an idea, at least. -Mike On Sun, Jan 3, 2010 at 16:40, Christina Salazar christinagama...@gmail.com wrote: Y'know... I think y'all should order extras and sell and ship them to those of us who cannot attend. I love my past conference t-shirts and they get some interesting reactions when I wear 'em. I'd buy any one of these designs. Seems like you might be able to make a bit of dough for scholarships and whatnot... Christina Salazar On Sun, Jan 3, 2010 at 12:23 PM, Patrick Hochstenbach patrick.hochstenb...@ugent.be wrote: Hello All, Here is the Inkscape entry designed by my lovely wife :) Greetings from Belgium, P@ Skype: patrick.hochstenbach Patrick Hochstenbach Software Architect University Library +32(0)92647980 Ghent University * Rozier 9 * 9000 * Gent
Re: [CODE4LIB] SVN/Mercurial hosting
Also, Google Code offers both HG and SVN support. http://code.google.com/projecthosting/ I have several projects there (although haven't used Mercurial) and certainly find it a lot less frustrating than admin'ing Trac. -Ross. On Wed, Dec 16, 2009 at 2:39 PM, Mark A. Matienzo m...@matienzo.org wrote: Hi Yitzchak, I've been pretty happy with using BitBucket [1] to host Mercurial repositories. It doesn't have Trac, but it does have it's own decently featured issue tracker, commit log viewer, and wiki system. The free plan is generous enough for you to get started. [1] http://bitbucket.org/ Mark A. Matienzo Applications Developer, Strategic Planning The New York Public Library On Wed, Dec 16, 2009 at 2:22 PM, Yitzchak Schaffer yitzchak.schaf...@gmx.com wrote: Hello all, As I was considering whether to migrate our SVN repositories to Mercurial (or possibly Bazaar) so as to allow for distributed control (like if I'm on the train or otherwise off the grid), I got word from our IT higher-ups that they want us to stop hosting our code on our domain and server. Before I start trekking around looking for hosting, does anyone in the crowd here have a server set up, and is potentially willing to host Trac+SVN or Trac+HG for our open-source projects? We currently have two. Alternately, I'd love to hear suggestions on regular hosting providers - particularly for Trac+Mercurial. Many thanks, -- Yitzchak Schaffer Systems Manager Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x5230 Fax (212) 627-3197 Email yitzchak.schaf...@gmx.com Access Problems? Contact systems.libr...@touro.edu
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
I suppose it would be helpful to actually know the problem that is trying to be solved here (I mean, a lot of people, including myself, are throwing out solutions to a problem that's never been actually defined). Ethan, what, exactly, are you trying to do? Do you want authorized headings? Or do you want LCSH that appears in the wild? -Ross. On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote: On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote: Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Yes, using the actual headings extracted from bibliographic data seems to be a better approach. It's easier to rank them, and as Karen points out you get the actual post-coordinated headings, not just the headings LC has decided to establish authority records for. //Ed
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
It has an OpenSearch interface: http://id.loc.gov/authorities/opensearch But I don't think there's a way to explicitly limit to, say, the beginning of a label. lcsubjects.org has a sparql interface where you could use a regex filter on the labels: http://api.talis.com/stores/lcsh-info/services/sparql But it's probably not going to be fast enough for what you're talking about. There are around 647,500 distinct labels in the dump files, it might be easier and better to just grab the n-triples file, pull the lines with http://www.w3.org/2004/02/skos/core#prefLabel and http://www.w3.org/2004/02/skos/core#altLabel and shove them in a local data store. On the other hand, if you don't care where your autocomplete string is coming from in the label, you could try: http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Av+||+altlabel%3Avmax=10offset=0sort=xsl=content-type= http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Avi+||+altlabel%3Avimax=10offset=0sort=xsl=content-type= http://api.talis.com/stores/lcsh-info/items?query=preflabel%3Avir+||+altlabel%3Avirmax=10offset=0sort=xsl=content-type= etc. -Ross. On Mon, Dec 7, 2009 at 10:46 AM, Ethan Gruber ewg4x...@gmail.com wrote: Hi all, I have a need to integrate the LCSH terms into a web form that uses auto-suggest to control the vocabulary. Is this technically possible with the id.loc.gov service? I can curl a specific id to view the rdf, but I would need to know the specifics of the search index on the site to feed the auto-suggest. For example, when the user types va in the box, the results should filter all subject headings that begin with va. I can certainly accomplish this by indexing the ~400 meg XML file into solr and use TermsComponent to filter terms dynamically as the user types, but I'd rather use the LOC's service if possible. So my question is: has anyone successfully done this before in the way I described? Thanks, Ethan Gruber University of Virginia Library
Re: [CODE4LIB] calling another webpage within CGI script - solved!
On Tue, Nov 24, 2009 at 11:18 AM, Graham Stewart graham.stew...@utoronto.ca wrote: We run many Library / web / database applications on RedHat servers with SELinux enabled. Sometimes it takes a bit of investigation and horsing around but I haven't yet found a situation where it had to be disabled. setsebool and chcon can solve most problems and SELinux is an excellent enhancement to standard filesystem and ACL security. Agreed that SELinux is useful but it is a tee-otal pain in the keister if you're ignorantly working against it because you didn't actually know it was there. It's sort of the perfect embodiment between the disconnect between the developer and the sysadmin. And, if this sort of tension interests you, vote for Bess Sadler's presentation at Code4lib 2010: Vampires vs. Werewolves: Ending the War Between Developers and Sysadmins with Puppet and anything else that interests you. http://vote.code4lib.org/election/index/13 -Ross Bringin' it on home Singer.
Re: [CODE4LIB] Assigning DOI for local content
On Mon, Nov 23, 2009 at 1:07 PM, Eric Hellman e...@hellman.net wrote: Does this answer your question, Ross? Yes, sort of. My question was not so much if you can resolve handles via bindings other than HTTP (since that's one of the selling points of handles) as it was do people actually use this in the real world? Of course, it may be impossible to answer that question since, by your example, such people may not actually be letting anybody know that they're doing that (although you would probably be somebody with insider knowledge on this topic). Also, with your use cases, would these services be impossible if the only binding was HTTP? Presumably dx.hellman.net would need to harvest its metadata from somewhere, which seems like it would leave a footprint. It also needs some mechanism to stay in sync with the master index. Your non-resolution service also seems to be looking these things up in realtime. Would a RESTful or SOAP API (*shudder*) not accomplish the same goal? Really, though, the binding argument here is less the issue here than if you believe http URIs are valid identifiers or not since there's no reason a URI couldn't be dereferenced via other bindings, either. -Ross.
Re: [CODE4LIB] Assigning DOI for local content
On Mon, Nov 23, 2009 at 2:52 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Well, here's the trick about handles, as I understand it. A handle, for instance, a DOI, is 10.1074/jbc.M004545200. Well, actually, it could be: 10.1074/jbc.M004545200 doi:10.1074/jbc.M004545200 info:doi/10.1074/jbc.M004545200 etc. But there's still got to be some mechanism to get from there to: http://dx.doi.org/10.1074/jbc.M004545200 or http://dx.hellman.net/10.1074/jbc.M004545200 I don't see why it's any different, fundamentally, than: http://purl.hellman.net/?purl=http%3A%2F%2Fpurl.org%2FNET%2Fdoi%2F10.1074%2Fjbc.M004545200 besides being prettier. Anyway, my argument wasn't that Purl was technologically more sound that handles -- Purl services have a major single-point-of-failure problem -- it's just that I don't buy the argument that handles are somehow superior because they aren't limited to HTTP. What I'm saying is that there plenty of valid reasons to value handles more than purls (or any other indirection service), but independence to HTTP isn't one of them. -Ross. While, for DOI handles, normally we resolve that using dx.doi.org, at http://dx.doi.org/10.1074/jbc.M004545200, that is not actually a requirement of the handle system. You can resolve it through any handle server, over HTTP or otherwise. Even if it's still over HTTP, it doesn't have to be at dx.doi.org, it can be via any handle resolver. For instance, check this out, it works: http://hdl.handle.net/10.1074/jbc.M004545200 Cause the DOI is really just a subset of Handles, any resolver participating in the handle network can resolve em. In Eric's hypothetical use case, that could be a local enterprise handle resolver of some kind. (Although I'm not totally sure that would keep your usage data private; the documentation I've seen compares the handle network to DNS, it's a distributed system, I'm not sure in what cases handle resolution requests are sent 'upstream' by the handle resolver, and if actual individual lookups are revealed by that or not. But in any case, when Ross suggests -- Presumably dx.hellman.net would need to harvest its metadata from somewhere, which seems like it would leave a footprint. It also needs some mechanism to stay in sync with the master index. -- my reading this suggests this is _built into_ the handle protocol, it's part of handle from the very start (again, the DNS analogy, with the emphasis on the distributed resolution aspect), you don't need to invent it yourself. The details of exactly how it works, I don't know enough to say. ) Now, I'm somewhat new to this stuff too, I don't completely understand how it works. Apparently hdl.handle.net can strikehandle/strike deal with any handle globally, while presumably dx.doi.org can only deal with the subset of handles that are also DOIs. And apparently you can have a handle resolver that works over something other than HTTP too. (Although Ross argues, why would you want to? And I'm inclined to agree). But appears that the handle system is quite a bit more fleshed out than a simple purl server, it's a distributed protocol-independent network. The protocol-independent part may or may not be useful, but it certainly seems like it could be, it doens't hurt to provide for it in advance. The distributed part seems pretty cool to me. So if it's no harder to set up, maintain, and use a handle server than a Purl server (this is a big 'if', I'm not sure if that's the case), and handle can do everything purl can do and quite a bit more (I'm pretty sure that is the case)... why NOT use handle instead of purl? It seems like handle is a more fleshed out, robust, full-featured thing than purl. Jonathan Presumably dx.hellman.net would need to harvest its metadata from somewhere, which seems like it would leave a footprint. It also needs some mechanism to stay in sync with the master index. Your non-resolution service also seems to be looking these things up in realtime. Would a RESTful or SOAP API (*shudder*) not accomplish the same goal? Really, though, the binding argument here is less the issue here than if you believe http URIs are valid identifiers or not since there's no reason a URI couldn't be dereferenced via other bindings, either. -Ross.
Re: [CODE4LIB] Assigning DOI for local content
On Fri, Nov 20, 2009 at 2:23 PM, Eric Hellman e...@hellman.net wrote: Having incorporated the handle client software into my own stuff rather easily, I'm pretty sure that's not true. Fair enough. The technology is binding independent. So you are using and sharing handles using some protocol other than HTTP? I'm more interested in the sharing part of that question. What is the format of the handle identifier in this context? What advantage does it bring over HTTP? -Ross.
Re: [CODE4LIB] Assigning DOI for local content
Back in 2007, I had a different job, different email address and lived in a different state. Things change. If people are sending emails to ross.sin...@gatech.edu to fix the library web services, they are going to be sorely disappointed and should perhaps check http://www.library.gatech.edu/about/staff.php for updates. purl.org has been going through a massive architecture change for the better part of a year now -- which has finally been completed. It was a slightly messy transition but they migrated from their homegrown system to one designed by Zepheira. I feel like predicting the demise of HTTP and worrying about a services' ability to handle other protocols is unnecessary hand wringing. I still have a telephone (two, in fact). Both my cell phone and VOIP home phone are still able to communicate flawlessly with a POTS dial phone. My car still has an internal combustion engine based on petroleum. It still doesn't fly or even hover. My wall outlets still accept a plug made in the 1960s. PURLs themselves are perfectly compatible with protocols other than HTTP: http://purl.org/NET/rossfsinger/ftpexample The caveat being that the initial access point is provided via HTTP. But then again, so is http://hdl.handle.net/, which, in fact, the only way currently in practice to dereference handles. My point is, there's a lot of energy, resources and capital invested in HTTP. Even if it becomes completely obsolete, my guess I can still type http://purl.org/dc/terms; in spdy://google.com/ and find something about what I'm looking for. -Ross. On Thu, Nov 19, 2009 at 12:18 PM, Han, Yan h...@u.library.arizona.edu wrote: Please explain in more details, that will be more helpful. It has been a while. Back to 2007, I checked PURL's architecture, and it was straightly handling web addresses only. Of course, current HTTP protocol is not going to last forever, and there are other protocols in the Internet. The coverage of PURL is not enough. From PURL's website, it still says PURLs (Persistent Uniform Resource Locators) are Web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. I am not sure what web addresses means. http://www.purl.org/docs/help.html#overview says PURLs are Persistent Uniform Resource Locators (URLs). A URL is simply an address on the World Wide Web. We all know that World Wide Web is not the Internet. What if info resource can be accessed through other Internet Protocols (FTP, VOIP, )? This is the limitation of PURL. PURL is doing re-architecture, though I cannot find out more documentation. The Handle system is The Handle System is a general purpose distributed information system that provides efficient, extensible, and secure HDL identifier and resolution services for use on networks such as the Internet.. http://www.handle.net/index.html Notice the difference in definition. Yan -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Wednesday, November 18, 2009 8:11 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Assigning DOI for local content On Wed, Nov 18, 2009 at 12:19 PM, Han, Yan h...@u.library.arizona.edu wrote: Currently DOI uses Handle (technology) with it social framework (i.e. administrative body to manage DOI). In technical sense, PURL is not going to last long. I'm not entirely sure what this is supposed to mean (re: purl), but I'm pretty sure it's not true. I'm also pretty sure there's little to no direct connection between purl and doi despite a superficial similarity in scope. -Ross.
Re: [CODE4LIB] Assigning DOI for local content
On Wed, Nov 18, 2009 at 12:19 PM, Han, Yan h...@u.library.arizona.edu wrote: Currently DOI uses Handle (technology) with it social framework (i.e. administrative body to manage DOI). In technical sense, PURL is not going to last long. I'm not entirely sure what this is supposed to mean (re: purl), but I'm pretty sure it's not true. I'm also pretty sure there's little to no direct connection between purl and doi despite a superficial similarity in scope. -Ross.
Re: [CODE4LIB] holdings standards/protocols
On Mon, Nov 16, 2009 at 9:58 AM, Chris Keene c.j.ke...@sussex.ac.uk wrote: Looks like our Talis system can't using the same process :( No, holdings aren't exported to Zebra. That being said, the opacxml format could be pretty easily added to the jangle connector. There's also something similar (well, sort of) in Keystone. What exactly are you looking for? Does this functionality work with AquaBrowser implementations on Voyager or III? I guess what I'm asking is, is the Z39.50 holdings format exactly what you want, or would there be a more ideal format to use? The opac format gets pretty gnarly with serials, for example (of course, everything does). -Ross.
Re: [CODE4LIB] MARC8 in marc-ruby
ruby-marc does not have any capacity to convert MARC-8 in any ruby interpreter: MRI, JRuby, Rubinius, whatever. Given the amount of work required to include this (unless Mark Matienzo feels like hacking into ruby-marc what he did for pymarc), I think I'd need to see a really compelling need (that can't be solved by one of the options that Ed mentioned) before making this much of a priority. -Ross. On Mon, Nov 2, 2009 at 11:42 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I thought that marc-ruby did MARC8 already! Wait, does it just do it in 'native' ruby interpreter, but not in jruby? I'm dealing with records in MARC8 now I think with marc-ruby, and it looked like the non-roman characters were coming accross okay! I might need to go investigate my setup further now Jonathan Ed Summers wrote: Hi Brendan: Ahh the lovely MARC-8 :-) It's a fair bit of effort I think. One approach could be to porting the MARC8-Unicode functionality from pymarc [1,2]. It's only one-way, but that's normally what most sane people want to do anyhow. Another approach would be to look into wrapping yaz-iconv [3] from IndexData which provides much more (and faster) MARC related character mapping facilities. If you just want to get something done without extending ruby-marc you can pre-process your data with yaz-marcdump and then throw it at ruby-marc. Or perhaps if you are in jruby-land you could use marc4j which has MARC-8 support. I've cc'ed code4lib since someone else might have some better ideas. Thanks for writing. //Ed [1] http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8.py [2] http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8_mapping.py [3] http://www.indexdata.com/yaz/doc/yaz-iconv.html [4] http://marc4j.tigris.org/ On Fri, Oct 30, 2009 at 3:22 AM, Brendan Boesen bboe...@nla.gov.au wrote: Hi Guys, I guess this is the 'bug the authors if you need it' email. I'm trying to parse a MARC record and it contains Chinese characters. From the leader: 01051cam 2200265 a 4504 it looks like the record uses MARC8 encoding. I'm investigating a way to get a Unicode encoded one but that may not work out. What sort of effort do you think is involved in adding MARC8 support into marc-ruby? (And is there anything I could do to help with that?) Regards, Brendan Boesen National Library of Australia
Re: [CODE4LIB] Greenstone: tweaking Lucene indexing
Yitzchak, are you interested in actually searching the fulltext? Or just highlighting the terms? If you're only interested in highlighting it, it might be a whole lot easier to implement this in javascript through something like jQuery: http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html That way you're not juggling mostly redundant Lucene indexes and trying to keep them synced. How are you getting your search results? Does Greenstone have some sort of search API that returns the highlighted results? Would it make a difference if you could add a field to the Lucene document (meaning would you have access to it through your PHP API to Greenstone)? If so, you could probably do this pretty easily via one of the JVM scripting languages (Groovy, JRuby, Jython, Quercus -- PHP in the JVM) so you just have the single Lucene index instead of multiple. Another approach might be to serve the Lucene index via Solr [1] or Lucene-WS (http://lucene-ws.net/) which would allow you to skip Greenstone altogether for searching. Basically, I would try to avoid going the Zend_Lucene route if at all possible. -Ross. 1. http://www.google.com/search?q=solr+on+an+existing+lucene+indexie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:officialclient=firefox-a On Tue, Sep 29, 2009 at 11:32 AM, Yitzchak Schaffer yitzchak.schaf...@gmx.com wrote: Erik Hatcher wrote: I'm a bit confused then. You mentioned that somehow Zend Lucene was going to help, but if you don't have the text to highlight anywhere then the Highlighter isn't going to be of any use. Again, you don't need the full text in the Lucene index, but you do need it get it from somewhere in order to be able to highlight it. Erik, I started to port the native Greenstone Java Lucene wrapper to PHP, so I could then modify it to add this feature, as I don't know Java. This would mean using Zend Lucene for the actual indexing implementation. My question is whether anyone's already done it, in Java or otherwise. Thanks for the clarification, -- Yitzchak Schaffer Systems Manager Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x5230 Fax (212) 627-3197 Email yitzchak.schaf...@gmx.com
Re: [CODE4LIB] A few Ruby MARC announcements
Thanks for pointing that out, Ed. Since, of course, the only thing worse than lies and damn lies are, as we know, statistics, let me give some context here :) These benchmarks were run on a 45MB marcxml document with a little less than 17k records in it that I happened to have on my machine. Hopefully that helps clear up the numbers a bit (although probably). I can definitely say that process was not entirely scientific -- it was run on my work machine, during work hours with other work-related applications running. But I ran them a couple of time each and they are pretty representative of the average. Thanks Ed (and also Kevin Clarke and Will Groppe) for making rubymarc in the first place and thanks to Jonathan Rochkind and Bill Dueber for helping flesh out how these pluggable parsers/serializers should work. -Ross. On Thu, Sep 24, 2009 at 12:48 AM, Ed Summers e...@pobox.com wrote: Nice work Ross! Users of rubymarc might like to see the performance enhancements that motivated you to do the nokogiri integration: http://paste.lisp.org/display/87529 !!! //Ed On Wed, Sep 23, 2009 at 10:51 PM, Ross Singer rossfsin...@gmail.com wrote: Hi everybody, Apologies for the crossposting. I wanted to let people know that Ruby MARC 0.3.0 was just released as a gem. This version addresses the biggest complaint about Ruby MARC, which was the fact that it could only parse MARCXML with REXML, Ruby's native XML parser (which, if you've used it, you hate it). Now you can use Nokogiri (http://nokogiri.rubyforge.org/) or, if you're using JRuby, jrexml instead of REXML, if you want. This release *shouldn't* break anything. The rubyforge project is here: http://rubyforge.org/projects/marc The rdocs are here: http://marc.rubyforge.org/ The source is here: http://marc.rubyforge.org/svn/ To install: sudo gem install marc While I'm making MARC and Ruby related announcements, I'd like to point out a project I released a couple of weeks ago that sits on top of Ruby MARC, called enhanced-marc. It's basically a domain specific language for working with the MARC fixed fields and providing a set of objects and methods to more easily parse what the record is describing. For example: require 'enhanced_marc' reader = MARC::Reader.new('marc.dat') records = [] reader.each do | record | records record end records[0].class = MARC::BookRecord records[0].is_conference? = false records[0].is_manuscript? = false # Send a boolean true if you want human readable forms, rather than MARC codes. records[0].literary_form(true) = Non-fiction records[0].nature_of_contents(true) = [Bibliography, Catalog] records[1].class = MARC::SoundRecord records[1].composition_form(true) = Jazz records[2].class = MARC::MapRecord records[2].projection(true) = [Cylindrical, Mercator] records[2].relief(true) = [Color] The enhanced-marc project is here: http://github.com/rsinger/enhanced-marc To install it: gem sources -a http://gems.github.com sudo gem install rsinger-enhanced_marc Let me know if you have any problems or suggestions with either of these. Thanks! -Ross.
[CODE4LIB] A few Ruby MARC announcements
Hi everybody, Apologies for the crossposting. I wanted to let people know that Ruby MARC 0.3.0 was just released as a gem. This version addresses the biggest complaint about Ruby MARC, which was the fact that it could only parse MARCXML with REXML, Ruby's native XML parser (which, if you've used it, you hate it). Now you can use Nokogiri (http://nokogiri.rubyforge.org/) or, if you're using JRuby, jrexml instead of REXML, if you want. This release *shouldn't* break anything. The rubyforge project is here: http://rubyforge.org/projects/marc The rdocs are here: http://marc.rubyforge.org/ The source is here: http://marc.rubyforge.org/svn/ To install: sudo gem install marc While I'm making MARC and Ruby related announcements, I'd like to point out a project I released a couple of weeks ago that sits on top of Ruby MARC, called enhanced-marc. It's basically a domain specific language for working with the MARC fixed fields and providing a set of objects and methods to more easily parse what the record is describing. For example: require 'enhanced_marc' reader = MARC::Reader.new('marc.dat') records = [] reader.each do | record | records record end records[0].class = MARC::BookRecord records[0].is_conference? = false records[0].is_manuscript? = false # Send a boolean true if you want human readable forms, rather than MARC codes. records[0].literary_form(true) = Non-fiction records[0].nature_of_contents(true) = [Bibliography, Catalog] records[1].class = MARC::SoundRecord records[1].composition_form(true) = Jazz records[2].class = MARC::MapRecord records[2].projection(true) = [Cylindrical, Mercator] records[2].relief(true) = [Color] The enhanced-marc project is here: http://github.com/rsinger/enhanced-marc To install it: gem sources -a http://gems.github.com sudo gem install rsinger-enhanced_marc Let me know if you have any problems or suggestions with either of these. Thanks! -Ross.
Re: [CODE4LIB] Implementing OpenURL for simple web resources
Owen, I might have missed it in this message -- my eyes are starting glaze over at this point in the thread, but can you describe how the input of these resources would work? What I'm basically asking is -- what would the professor need to do to add a new: citation for a 70 year old book; journal on PubMed; URL to CiteSeer? How does their input make it into your database? -Ross. On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens o.steph...@open.ac.uk wrote: True. How, from the OpenURL, are you going to know that the rft is meant to represent a website? I guess that was part of my question. But no one has suggested defining a new metadata profile for websites (which I probably would avoid tbh). DC doesn't seem to offer a nice way of doing this (that is saying 'this is a website'), although there are perhaps some bits and pieces (format, type) that could be used to give some indication (but I suspect not unambiguously) But I still think what you want is simply a purl server. What makes you think you want OpenURL in the first place? But I still don't really understand what you're trying to do: deliver consistency of approach across all our references -- so are you using OpenURL for it's more conventional use too, but you want to tack on a purl-like functionality to the same software that's doing something more like a conventional link resolver? I don't completely understand your use case. I wouldn't use OpenURL just to get a persistent URL - I'd almost certainly look at PURL for this. But, I want something slightly different. I want our course authors to be able to use whatever URL they know for a resource, but still try to ensure that the link works persistently over time. I don't think it is reasonable for a user to have to know a 'special' URL for a resource - and this approach means establishing a PURL for all resources used in our teaching material whether or not it moves in the future - which is an overhead it would be nice to avoid. You can hit delete now if you aren't interested, but ... ... perhaps if I just say a little more about the project I'm working on it may clarify... The project I'm working on is concerned with referencing and citation. We are looking at how references appear in teaching material (esp. online) and how they can be reused by students in their personal environment (in essays, later study, or something else). The references that appear can be to anything - books, chapters, journals, articles, etc. Increasingly of course there are references to web-based materials. For print material, references generally describe the resource and nothing more, but for digital material references are expected not only to describe the resource, but also state a route of access to the resource. This tends to be a bad idea when (for example) referencing e-journals, as we know the problems that surround this - many different routes of access to the same item. OpenURLs work well in this situation and seem to me like a sensible (and perhaps the only viable) solution. So we can say that for journals/articles it is sensible to ignore any URL supplied as part of the reference, and to form an OpenURL instead. If there is a DOI in the reference (which is increasingly common) then that can be used to form a URL using DOI resolution, but it makes more sense to me to hand this off to another application rather than bake this into the reference - and OpenURL resolvers are reasonably set to do this. If we look at a website it is pretty difficult to reference it without including the URL - it seems to be the only good way of describing what you are actually talking about (how many people think of websites by 'title', 'author' and 'publisher'?). For me, this leads to an immediate confusion between the description of the resource and the route of access to it. So, to differentiate I'm starting to think of the http URI in a reference like this as a URI, but not necessarily a URL. We then need some mechanism to check, given a URI, what is the URL. Now I could do this with a script - just pass the URI to a script that checks what URL to use against a list and redirects the user if necessary. On this point Jonathan said if the usefulness of your technique does NOT count on being inter-operable with existing link resolver infrastructure... PERSONALLY I would be using OpenURL, I don't think it's worth it - but it struck me that if we were passing a URI to a script, why not pass it in an OpenURL? I could see a number of advantages to this in the local context: Consistency - references to websites get treated the same as references to journal articles - this means a single approach on the course side, with flexibility Usage stats - we could collect these whatever, but if we do it via OpenURL we get this in the same place as the stats about usage of other scholarly material and could consider driving
Re: [CODE4LIB] Implementing OpenURL for simple web resources
Given that the burden of creating these links is entirely on RefWorks Telstar, OpenURL seems as good a choice as anything (since anything would require some other service, anyway). As long as the profs aren't expected to mess with it, I'm not sure that *how* you do the indirection matters all that much and, as you say, there are added bonuses to keeping it within SFX. It seems to me, though, that your rft_id should be a URI to the db you're using to store their references, so your CTX would look something like: http://res.open.ac.uk/?rfr_id=info:/telstar.open.ac.ukrft_id=http://telstar.open.ac.uk/1234dc.identifier=http://bbc.uk.co/ # not url encoded because I have, you know, a life. I can't remember if you can include both metadata-by-reference keys and metadata-by-value, but you could have by-reference (rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something) point at your citation db to return a formatted citation. This way your citations are unique -- somebody pointing at today's London Times frontpage isn't the same as somebody else's on a different day. While I'm shocked that I agree with using OpenURL for this, it seems as reasonable as any other solution. That being said, unless you can definitely offer some other service besides linking to the resource, I'd avoid the resolver menu completely. -Ross. On Tue, Sep 15, 2009 at 11:17 AM, O.Stephens o.steph...@open.ac.uk wrote: Ross - no you didn't miss it, There are 3 ways that references might be added to the learning environment: An author (or realistically a proxy on behalf of the author) can insert a reference into a structured Word document from an RIS file. This structured document (XML) then goes through a 'publication' process which pushes the content to the learning environment (Moodle), including rendering the references from RIS format into a specified style, with links. An author/librarian/other can import references to a 'resources' area in our learning environment (Moodle) from a RIS file An author/librarian/other can subscribe to an RSS feed from a RefWorks 'RefShare' folder within the 'resources' area of the learning environment In general the project is focussing on the use of RefWorks - so although the RIS files could be created by any suitable s/w, we are looking specifically at RefWorks. How you get the reference into RefWorks is something we are looking at currently. The best approach varies depending on the type of material you are looking at: For websites it looks like the 'RefGrab-it' bookmarklet/browser plugin (depending on your browser) is the easiest way of capturing website details. For books, probably a Union catalogue search from within RefWorks For journal articles, probably a Federated search engine (SS 360 is what we've got) Any of these could be entered by hand of course, as could several other kinds of reference Entering the references into RefWorks could be done by an author, but it more likely to be done by a member of clerical staff or a librarian/library assistant Owen Owen Stephens TELSTAR Project Manager Library and Learning Resources Centre The Open University Walton Hall Milton Keynes, MK7 6AA T: +44 (0) 1908 858701 F: +44 (0) 1908 653571 E: o.steph...@open.ac.uk -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: 15 September 2009 15:56 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Implementing OpenURL for simple web resources Owen, I might have missed it in this message -- my eyes are starting glaze over at this point in the thread, but can you describe how the input of these resources would work? What I'm basically asking is -- what would the professor need to do to add a new: citation for a 70 year old book; journal on PubMed; URL to CiteSeer? How does their input make it into your database? -Ross. On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens o.steph...@open.ac.uk wrote: True. How, from the OpenURL, are you going to know that the rft is meant to represent a website? I guess that was part of my question. But no one has suggested defining a new metadata profile for websites (which I probably would avoid tbh). DC doesn't seem to offer a nice way of doing this (that is saying 'this is a website'), although there are perhaps some bits and pieces (format, type) that could be used to give some indication (but I suspect not unambiguously) But I still think what you want is simply a purl server. What makes you think you want OpenURL in the first place? But I still don't really understand what you're trying to do: deliver consistency of approach across all our references -- so are you using OpenURL for it's more conventional use too, but you want to tack on a purl-like functionality to the same software that's doing something more like a conventional link resolver? I don't completely understand your use case
Re: [CODE4LIB] Implementing OpenURL for simple web resources
Oh yeah, one thing I left off -- In Moodle, it would probably make sense to link to the URL in the a tag: a href=http://bbc.co.uk/;The Beeb!/a but use a javascript onMouseDown action to rewrite the link to route through your funky link resolver path, a la Google. That way, the page works like any normal webpage, right mouse click-Copy Link Location gives the user the real URL to copy and paste, but normal behavior funnels through the link resolver. -Ross. On Tue, Sep 15, 2009 at 11:41 AM, Ross Singer rossfsin...@gmail.com wrote: Given that the burden of creating these links is entirely on RefWorks Telstar, OpenURL seems as good a choice as anything (since anything would require some other service, anyway). As long as the profs aren't expected to mess with it, I'm not sure that *how* you do the indirection matters all that much and, as you say, there are added bonuses to keeping it within SFX. It seems to me, though, that your rft_id should be a URI to the db you're using to store their references, so your CTX would look something like: http://res.open.ac.uk/?rfr_id=info:/telstar.open.ac.ukrft_id=http://telstar.open.ac.uk/1234dc.identifier=http://bbc.uk.co/ # not url encoded because I have, you know, a life. I can't remember if you can include both metadata-by-reference keys and metadata-by-value, but you could have by-reference (rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something) point at your citation db to return a formatted citation. This way your citations are unique -- somebody pointing at today's London Times frontpage isn't the same as somebody else's on a different day. While I'm shocked that I agree with using OpenURL for this, it seems as reasonable as any other solution. That being said, unless you can definitely offer some other service besides linking to the resource, I'd avoid the resolver menu completely. -Ross. On Tue, Sep 15, 2009 at 11:17 AM, O.Stephens o.steph...@open.ac.uk wrote: Ross - no you didn't miss it, There are 3 ways that references might be added to the learning environment: An author (or realistically a proxy on behalf of the author) can insert a reference into a structured Word document from an RIS file. This structured document (XML) then goes through a 'publication' process which pushes the content to the learning environment (Moodle), including rendering the references from RIS format into a specified style, with links. An author/librarian/other can import references to a 'resources' area in our learning environment (Moodle) from a RIS file An author/librarian/other can subscribe to an RSS feed from a RefWorks 'RefShare' folder within the 'resources' area of the learning environment In general the project is focussing on the use of RefWorks - so although the RIS files could be created by any suitable s/w, we are looking specifically at RefWorks. How you get the reference into RefWorks is something we are looking at currently. The best approach varies depending on the type of material you are looking at: For websites it looks like the 'RefGrab-it' bookmarklet/browser plugin (depending on your browser) is the easiest way of capturing website details. For books, probably a Union catalogue search from within RefWorks For journal articles, probably a Federated search engine (SS 360 is what we've got) Any of these could be entered by hand of course, as could several other kinds of reference Entering the references into RefWorks could be done by an author, but it more likely to be done by a member of clerical staff or a librarian/library assistant Owen Owen Stephens TELSTAR Project Manager Library and Learning Resources Centre The Open University Walton Hall Milton Keynes, MK7 6AA T: +44 (0) 1908 858701 F: +44 (0) 1908 653571 E: o.steph...@open.ac.uk -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: 15 September 2009 15:56 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Implementing OpenURL for simple web resources Owen, I might have missed it in this message -- my eyes are starting glaze over at this point in the thread, but can you describe how the input of these resources would work? What I'm basically asking is -- what would the professor need to do to add a new: citation for a 70 year old book; journal on PubMed; URL to CiteSeer? How does their input make it into your database? -Ross. On Tue, Sep 15, 2009 at 5:04 AM, O.Stephens o.steph...@open.ac.uk wrote: True. How, from the OpenURL, are you going to know that the rft is meant to represent a website? I guess that was part of my question. But no one has suggested defining a new metadata profile for websites (which I probably would avoid tbh). DC doesn't seem to offer a nice way of doing this (that is saying 'this is a website'), although there are perhaps some bits and pieces (format, type) that could
Re: [CODE4LIB] Implementing OpenURL for simple web resources
On Tue, Sep 15, 2009 at 12:06 PM, Eric Hellman e...@hellman.net wrote: Yes, you can. In this case, I say punt on dc.identifier, throw the URL in rft_id (since, Eric, you had some concern regarding using the local id for this?) and let the real URL persistence/resolution work happen with the by-ref negotiation. -Ross. On Sep 15, 2009, at 11:41 AM, Ross Singer wrote: I can't remember if you can include both metadata-by-reference keys and metadata-by-value, but you could have by-reference (rft_ref=http://telstar.open.ac.uk/1234rft_ref_fmt=RIS or something) point at your citation db to return a formatted citation. Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/
Re: [CODE4LIB] FW: PURL Server Update 2
On Tue, Sep 1, 2009 at 7:51 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Thus I have to believe them that they did not have a compromised server and instead they had a hardware failure. I have no idea why they couldn't just restore from backup which would at least gotten them back to where they were from the last backup (which presumably was at most a week ago, if not someone should have a lot of explaining to do to someone). I didn't want to join this speculation party, but here goes. It's quite possible that part of the problem here is that the significant hardware failure meant that the replacement was a completely different architecture (let's say for argument's sakes that the server that failed was AS/400 and the replacement was Solaris on an Intel server) because IT policy (or, you know, reality) dictated that the old hardware would be replaced if it failed. So then we're not just talking about backing up from tape -- things need to be compiled -- there are perhaps problems with legacy C libraries, character sets, *whatever*. When I was working at Emory, we had a grant funded project that indexed a handful of collections of SGML EAD files in an app called iSearch (http://www.etymon.com/tr.html#). When the (admittedly neglected) VA Linux server it ran on had a major problem it was insanely non-trivial to get this completely orphaned application running in a contemporary operating system (in this case, RedHat). Old versions of iSearch /would not under any circumstances/ compile -- new ones couldn't read the old data. The application was down for -- I don't know -- months, IIRC. Granted, this was nowhere near the priority of GPO's PURL server -- but you can't stop time to solve these sorts of Catch-22s, either. Things happen. Catastrophes generally have the added advantage of ensuring they don't happen again for a while. -Ross.
Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards
On Wed, Aug 12, 2009 at 10:48 AM, Karen Coyleli...@kcoyle.net wrote: Ross Singer wrote: 3) What, specifically, is missing from DCTerms that would make a MODS ontology needed? What, specifically, is missing from Bibliontology or MusicOntology or FOAF or SKOS, etc. that justifies a new and, in many places, overlapping vocabulary? Would time be better spent trying to improve the existing vocabularies? MARC: 182 fields, 1711 subfields, 2401 fixed field values DC: 59 properties I see where you're going with this, but I'm not sure it's a fair critique. It's sort of on par with saying that a Dodge Grand Caravan is a more sophisticated vehicle than a Mini Cooper because it has more horsepower, 3 times as many cup holders and vastly more cubic footage in the interior. A Caravan /may/ be a more sophisticated vehicle, but I'm not sure a quick run over the specs can necessarily reveal that. One of the problems here is that it doesn't begin to address the DCAM -- these are 59 properties that can be reused among 22 classes, giving them different semantic meaning. Look at the sample records in MARCXML and DC at http://www.loc.gov/standards/marcxml and you will see how lossy it is. Now I think you know you're being a little misleading here. For one thing, it's using DC Elements and it's not doing /anything/ vaguely RDF-related. Unfortunately, I think it's examples like this that have led libraries to write DC off as next to worthless (and understandably!). Dublin Core is toothless and practically worthless in XML form. It is considerably more powerful when used in RDF, however, because they play to their mutual strengths, namely that in RDF, you generally don't use a schema in isolation. Now, you could argue that no one needs all of the detail in MARC, and I'm sure it could be reduced down to something more rational, plus there is redundancy in it, but for pity's sake, DC doesn't have a way to indicate the EDITION of a work. This is true. But this is also why I'm asking what is missing in DCTerms that would be available in MODS -- The win of RDF is that you aren't contrained by the limits of a particular schema. If a particular vocabulary gets you a fair ways towards representing your resource, but something is missing, it's perfectly reasonable (and expected) to plug in other vocabularies to fill in the gaps. For example, SKOS doesn't need to add coordinate properties to properly define locations. Instead, you pull in a vocabulary that is optimized for defining geographic place (say, wgs_84) and rather than suboptimally retrofit a vocabulary designed for modeling thesauri, use one that is explicitly intended to model the resource at hand (and, preferably, only that). I think it's somewhat analogous to the notion of domain-specific languages: there's an abstraction between the resource and the most efficient way to access it. FOAF has both *surname* and *family name* and says: These are not current stable or consistent... No sh*t. And try to clearly code a name like Pope John Paul II in FOAF. Oh, and death dates. No death dates in FOAF because you wouldn't have DEAD FRIENDS. But authors die. FOAF isn't the only vocabulary available to model people and I'm hardly saying it's the answer here. I mean, MARC is complicated in this regard, too. Rodrigo Jimenez Hernandez Garcia Liu Ming Chung. Names are hard. I think pretty much any schema is going to have to have rules and conventions to compensate for the variability of how different cultures prescribe identity. Maybe vCard would be better (maybe not). The Bio vocabulary might be a better option for defining biographical events (birth, death, etc.). It lacks some of the attributes that libraries use (flourishing dates, for example) and shares the disadvantage inherent in RDF that RDF can't express inexact dates very well. I think a common misperception of RDF in library circles is that there is no vocabulary that does everything we need. Rather, I think that this is one of RDF's strength: no vocabulary can successfully model the universe, so, instead, focus on the specifics. The library world instead takes the opposite approach, which tends to cause things to get shoehorned in to meet the shape of the model rather than be expressed in a way more naturally suited to the resource. -Ross.
Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards
On Wed, Aug 12, 2009 at 1:45 PM, Karen Coyleli...@kcoyle.net wrote: Ross Singer wrote: One of the problems here is that it doesn't begin to address the DCAM -- these are 59 properties that can be reused among 22 classes, giving them different semantic meaning. Uh, no. That's the opposite of what the DC terms are about. Each term has a defined range -- so the defined range of creator is Agent. It can only be used as an Agent. You don't mix and match, and you don't assign different semantics to the same property under different circumstances. Jason clarified what I meant much better than I did, but I will take this a step further -- the DC properties have ranges, but only 5 have a constraint on their domain. So while dct:creator has to point at a dct:Agent (or some equivalent), where the dct:creator property lives can be anything. dct:Location about=#RhodeIsland dct:titleRhode Island/dct:title dct:creator dct:Agent about=#RogerWilliams dct:titleRoger Williams/dct:title dct:creator dct:Agent about=#JamesWilliams dct:titleJames Williams/dct:title /dct:Agent dct:Agent about=#AliceWilliams dct:titleAlice Williams/dct:title /dct:Agent /dct:creator /dct:Agent /dct:creator /dct:Location The definition of dct:title is: A name given to the resource. So dct:title could be re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards, Ross Singer or Chattanooga, TN depending on what resource we're talking about. Maybe semantics is a poor word choice, but I think Ross Singer as a title of an Agent resource or Chattanooga, TN as the title of a Location resource have some conceptual distinctions to For Whom the Bell Tolls. The creation of Rhode Island also carries a different mental image than the creation of Roger Williams. It seems like context influences semantics, at least somewhat. Dublin Core is toothless and practically worthless in XML form. It is considerably more powerful when used in RDF, however, because they play to their mutual strengths, namely that in RDF, you generally don't use a schema in isolation. The elements in Dublin Core are the elements in Dublin Core. The serialization shouldn't really matter. But if you need to distinguish between title and subtitle, Dublin Core's http://purl.org/dc/terms/title doesn't work. What matters is the actual *meaning* of the term, and the degree of precision you need. You can't use http://purl.org/dc/terms/title for Mr. or Dr. in a name -- it has a particular meaning. And you can't use it for title Proper as defined in library cataloging, because it doesn't have that meaning. It all depends on what you are trying to say. Again, Jason did a good job at explaining the difference. Dublin Core in XML (at least in every example I've ever seen) consists solely of literals. The values are text, not resources -- so in XML DC, not only would you be unable to attach, say, birth and death date properties to Roger Williams, you also wouldn't be able to say who his creators are. Going back to context defining semantics, I don't think it's unreasonable to say that dct:title does mean title distinct from subtitle if that's the expectation of how dct:title is to work within your vocabulary/class. ex:Book about=http://example.org/ex/1234; dct:titleZen and the Art of Motorcycle Maintenance/dct:title ex:subTitleAn Inquiry into Values/ex:subTitle /ex:Book The definition of dct:title is pretty ambiguous -- alternately, you might choose to use dct:title to contain the full title and define some other property for main title. Just because title doesn't have a clear definition doesn't mean rules can't be applied towards it when used in a particular domain (assuming they conform to 'the name of the resource'). This is true. But this is also why I'm asking what is missing in DCTerms that would be available in MODS -- The win of RDF is that you aren't contrained by the limits of a particular schema. If a particular vocabulary gets you a fair ways towards representing your resource, but something is missing, it's perfectly reasonable (and expected) to plug in other vocabularies to fill in the gaps. Exactly. But the range of available vocabularies today is quite limited. There are a lot of semantics that are used in libraries that I can't find in the available vocabularies. Eventually I think we will have what we need, but ... well, yesterday I was hunting all over for a data element for price. And the person who needed it didn't want to get into the complexity of ONIX. BIBO doesn't have it. DC doesn't have it. RDA doesn't have it. Something that simple. Well, Bibo doesn't have it because it has nothing to do with citations. GoodRelations (http://www.heppnetz.de/projects/goodrelations/) does, but I admit that it's usage seems rather baroque: http://www4.wiwiss.fu-berlin.de/bookmashup/doc/offers/0596000278googleOffer6997796095130913016
Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards
Whew -- just hit discard on my last message. On Wed, Aug 12, 2009 at 9:07 PM, Karen Coyleli...@kcoyle.net wrote: then my question is: has B changed? In other words, is B of class X the same as B of class Y? (Assuming that both B's have the same URI.). B (for our purposes we'll say it's http://example.org/ex/B;) can claim it's of as many types as the assertor is willing to predicate (making up words all over this place) as long as none of the classes anywhere assert that they owl:disjointWith (or some similar != assertion) another adopted type. So: rdf:Description about=http://example.org/ex/B; rdf:type resource=http://vocab.org/frbr/core#Manifestation; / dct:titleZen and the Art of the Motorcycle Maintenance/dct:title rdf:type resource=http://purl.org/ontology/bibo/Book / bibo:isbn100553277472/bibo:isbn10 /rdf:Description Ok -- everything's still in the clear. We've asserted that this resource is a book and that, in FRBR terms, it's also a manifestation. Both of these assertions are true but they're talking about the same resource in different vocabularies -- basically they describe the same thing in different world views: the FRBR model has no knowledge (nor need for knowledge) of bibliographic metadata and vice versa. Now, should you append to this graph something like: rdf:type resource=http://vocab.org/frbr/core#Text; / you've run aground. The FRBR schema claims that by being a Text (of course it makes no mention of what exactly that means) it implies also being an Expression but it also defines that frbr:Expression owl:disjointWith frbr:Manifestation (and vice-versa): that is, your resource can't be both an Expression and Manifestation, which makes sense. Now, this doesn't mean that Books and Manifestations are the same thing, it's just that /this/ book also happens to be a manifestation. As far your point about context goes, I think this comes down to trust, credibility and provenance. Even if you define special properties to contain specific parts of your data, there is no way to enforce it. For example, let's say our new RDA vocabulary has: rda:titleProper rda:remainderofTitle and all ILMSes move to an RDA/RDF model (I mean, yes, we're wandering into fantasyland, just bear with me) and begin to store our resources using this as the main data model. Now let's say we have a stash of data we'd like to add to our collection: maybe it's an e-book collection or a set of aggregated OA e-journals, a la DOAJ. The providers of this data are told we need it in our new RDA format and they comply. Let's say, though, that they weren't discriminate enough to distinguish the titleProper from the remainderOfTitle internally but in an effort to comply with our request, put their string in rda:titleProper (it's got to go somewhere, after all) and call it a day. Uncertainty has crept into the mix. After all, there's nothing, technically, stopping me from entering Zen and the art of motorcycle maintenance: an inquiry into values all in the 245$a. I think that replacing dct:title with rda:titleProper (rather than declaring that when used in RDA, dct:title should be the proper title) won't drastically help the purity of our data (especially if one of the motivations of RDA and RDF is the promise of externally supplied data) and will have the consequence of being in a vocabulary off the radar for anybody not in a library (and therefore ignored). It's a tough call, though. -Ross.
Re: [CODE4LIB] [Fwd: [ol-tech] Modified RDF/XML api]
Karen, The Bio vocabulary might help with the birth/death dates: http://vocab.org/bio/0.1/.html And foaf:isPrimaryTopicOf http://xmlns.com/foaf/spec/#term_isPrimaryTopicOf might be a good way to relate to the wikipedia page. I don't have any recommendation for alternate names (and would be interested in knowing of any, myself). All this isn't to discourage using the RDA vocabulary for any of this, but my concern is that its complexity, lack of documentation and kitchen sink approach will be daunting, especially for people coming from outside the library domain. I sort of look at RDA as the ontology of last resort. -Ross. On Tue, Aug 11, 2009 at 12:24 PM, Karen Coyleli...@kcoyle.net wrote: OK! thanks. There must be some default operating there... RDF for authors is now on to do list! Here are the data elements available: name alternate names website birth date death date wikipedia link FOAF doesn't cover death dates... RDA has death dates, alternate names. Should FOAF be used where possible, adding in RDA to fill in? There are a lot of elements they have in common. kc Ed Summers wrote: On Tue, Aug 11, 2009 at 10:40 AM, Karen Coyleli...@kcoyle.net wrote: Ed, I have NO IDEA how you got to rdf/xml from the OL author link -- do tell, and I'll take a look! There is no RDF/XML export template for authors, but one could be created. The URI/URL is simply the address of the author page, and also considered the author identifier on OL. The nice thing about this linked data stuff is all you have to do is follow your nose: -- e...@rorty:~$ curl --include --header Accept: application/rdf+xml http://openlibrary.org/a/OL1518080A HTTP/1.1 200 OK Content-Type: application/rdf+xml; charset=utf-8 Date: Tue, 11 Aug 2009 15:01:32 GMT Server: lighttpd/1.4.19 Transfer-Encoding: chunked Connection: Keep-Alive Age: 0 ?xml version=1.0 encoding=utf-8? rdf:RDF xmlns:ol='http://openlibrary.org/type/author' ol:nameLawrence Lessig/ol:name ol:personal_nameLawrence Lessig/ol:personal_name ol:key/a/OL1518080A/ol:key ol:typehttp://openlibrary.org/type/author.rdf/ol:type ol:id5209974/ol:id /rdf:RDF -- //Ed -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] Long way to be a good coder in library
On Wed, Jul 22, 2009 at 8:54 AM, Jon Gormanjonathan.gor...@gmail.com wrote: As far as languages, I'd probably lean towards ruby or python for starters or maybe Java. Then move into php after you have a grasp of good programming practice. You'll also figure out more what you like to work on. Given the plaintive tone of the original post, I disagree with this advice. Development is almost solely based on confidence and experience (with the latter affecting the former and vice-versa). Good code is secondary. I would almost certainly say start out with a procedural scripting language (or at least a procedural approach) that is more common and Googleable (PHP immediately comes to mind). The nice thing about something like PHP, in my mind, is that it's incredibly easy to see immediate results without having any real idea of what's going on (that being said, I have _no_ idea what Wayne's background might be -- perhaps this advice is too novice). As many others have replied, it's so much easier to learn by solving an actual problem (rather than following the 'pet store' example in your tutorial) and, in my mind, PHP is the easiest way get off the ground. Successes breed confidence to take on bigger projects, etc. Once you've realized that this stuff isn't rocket science, /then/ break out the theory, find a different language (perhaps more suited to the task at hand -- or not!) and think about good code. Rob Styles sent this to my delicious account the other day (I'm not sure what he was trying to tell me): http://cowboyprogramming.com/2007/01/18/the-seven-stages-of-programming/ which I think sums up the arc pretty well. -Ross.
[CODE4LIB] Fwd: [NGC4LIB] Integrating with your ILS through Web services and APIs
This seems a _far_ more appropriate list for these questions. -Ross. -- Forwarded message -- From: Breeding, Marshall marshall.breed...@vanderbilt.edu Date: Wed, Jul 22, 2009 at 9:53 PM Subject: [NGC4LIB] Integrating with your ILS through Web services and APIs To: ngc4...@listserv.nd.edu I am in the process of writing an issue of Library Technology Reports for ALA TechSource titled Hype or reality: Opening up library systems through Web Services and SOA. Today almost all ILS products make claims regarding offering more openness through APIs, Web services, and through a service-oriented architecture (SOA). This report aims to look beyond the marketing claims and identify specific types of tasks that can be accomplished beyond the delivered interfaces through programmatic access to the system internals. As part of the research for this article I am soliciting feedback from libraries that taken advantage of Web Services or other API's in conjunction with their core Integrated Library System (ILS) to meet specific needs. I'm interested in hearing about how you might have been able to integrate library content and services into applications, extracted data, automated processes or other novel applications. Please tell me about your experiences with your ILS in regard to the APIs it offers: - Do you feel like you can pretty much do anything you want with the system, or do you feel constrained? -Are the APIs offered able to address all the data and functionality within the ILS? -On the flip side, do you feel like your ILS is too closed? -Do you find the APIs offered by the developer of the ILS to be well documented? - What programming languages or other tools were you able to use to take advantage of these APIs? - What level of programming proficiency is required: Systems librarian with scripting languages, software development engineer, or something in between? - What's on your wish list? What kind of APIs would you like to see incorporated into your current or next ILS? - I'm interested in responses from those that use open source ILS products as well. Are you able to programmatically interact with the ILS? - Do you consider your ILS as embracing a true Service-oriented architecture? Systems vendors increasingly promote their ILS as SOA. Can you provide examples where the ILS does or does not exhibit traits of SOA in your environment. While it's important for the ILS to offer support for standard protocols such as Z39.50, NCIP, and OAI, that's not the core of the issue here. What I'm looking for are API's that allow the library to get at data and functionality not addressed by these protocols. Thanks in advance for sharing your experiences in ILS API's with me for this report. I appreciate your assistance. -marshall Summary excerpt: Libraries increasingly need to extract data, connect with external systems, and implement functionality not included with the delivered systems. Rather than being reliant on the products developers for enhancements to meet these needs, libraries increasingly demand the ability to exploit their systems using APIs, Web Services, or other technologies. Especially in libraries that exist in complex environments where many different systems need to interact, the demand for openness abounds. As libraries develop their IT infrastructure, it's imperative to understand the extent to which their automation products are able to interoperate and thrive in this growing realm of Web services. This report aims to assess the current slate of major library automation systems in regard to providing openness through API's, Web Services, and the adoption of SOA. Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Editor, Library Technology Guides http://www.librarytechnology.org 615-343-6094
Re: [CODE4LIB] rdf files as linked data
Oops, scratch my warning at the end of point 5. It shouldn't affect the point 1 strategy at all. Like I said, httpRange-14 is confusing :) -Ross. On Mon, Jul 20, 2009 at 10:58 PM, Ross Singerrossfsin...@gmail.com wrote: I'll pile on with a with a couple of other things: 1. I second Ed's point about conneg: http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221 should probably return a 300 code with pointers to your various file types. 2. Replace dc with dcterms (http://purl.org/dc/terms/) 3. While Ed's point about linking to other resources would be nice, first I'd focus on the resources you have and can control. Rather than a literal for dc:creator, can you mint URIs for all of your authors? How about subjects? 4. Your URIs in your rdf:descripti...@rdf:about] aren't terribly helpful on their own. Either give the full URI here or add an xml:base=http://infomotions.com/etexts/literature/english/1500-1599/; attribute to the tag -- that should improve things. 5. I think your dc:contributor tag might be running aground of httpRange-14 -- I'm pretty sure you didn't help Thomas More write his story. This, I think, is the absolute hardest thing to get right with RDF/LOD. A nice example of sidestepping this sort of collision is Toby Inkster's RDF-ification of Amazon Web Services: http://purl.org/NET/book/isbn/0140449108#book -- in this example, the 'record metadata' lives at the base URI (http://purl.org/NET/book/isbn/0140449108) and the real world object lives at http://purl.org/NET/book/isbn/0140449108#book. This way Toby can claim responsibility for making the data the available, but not assert that he had any part in creating the work itself. The two resources are linked to each other, but are each unique, independent URIs. If you do do this, though, it messes up what I said in point #1. The concordances would also be really neat to see -- building off of WordNet would be pretty cool with all of these old texts. Good luck, it's great to see. -Ross. On Mon, Jul 20, 2009 at 10:04 PM, Ed Summerse...@pobox.com wrote: Heya Eric: The main thing you'd want to do would be to make sure URIs like: http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221 returned something useful for both people and machine agents. The nitty gritty details of how to do this can roughly be found in the Cool URIs for the Semantic Web [1], or How to Publish Linked Data [2]. A slight variation would be to use something like RDFa [3] to embed metadata in your HTML docs, or GRDDL [4] to provide a stylesheet to transform some HTML to RDF. The end goal of linked data, is to provide contextual links from your stuff to other resources on the web, aka timbl's rule #4: Include links to other URIs. so that they can discover more things. [3] So for example you might want to assert that: http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221 owl:sameAs http://dbpedia.org/page/Utopia_(book) . or: http://infomotions.com/etexts/literature/english/1500-1599/more-utopia-221 dcterms:creator http://dbpedia.org/resource/Thomas_More . It's when you link out to other resources on the web that things get interesting, more useful, and potentially more messy :-) For example instead of owl:sameAs perhaps an assertion using FRBR or RDA would be more appropriate. Thanks for asking the question. The public-lod list [4] at the w3c is also a really friendly/helpful group of people making data sets available as linked-data. //Ed [1] http://www.w3.org/TR/cooluris/ [2] http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ [3] http://www.w3.org/TR/xhtml-rdfa-primer/ [4] http://www.w3.org/TR/grddl-primer/ [5] http://www.w3.org/DesignIssues/LinkedData.html [6] http://lists.w3.org/Archives/Public/public-lod/
Re: [CODE4LIB] Open, public standards v. pay per view standards and usage
On Wed, Jul 15, 2009 at 8:57 AM, Ray Denenberg, Library of Congressr...@loc.gov wrote: Ross, if you're talking about the ISO 20775 xml schema: http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd It's free. It's also not a spec, it's a schema. If the expectation is that people are actually going to adopt a standard from merely looking at an .xsd, my prediction is that this will go nowhere. I mean, I'm wrong a lot, but I feel pretty good about this reading from my crystal ball. -Ross.
Re: [CODE4LIB] Open, public standards v. pay per view standards and usage
Well, it's not a great example, because I don't have a 'counter-example', but I think it will remain to be seen if ISO 20775 goes anywhere if it, too, remains behind a pay wall. If an open spec were to come along that allowed the transfer of holdings and availability information that was decent and simple it would basically render ISO 20775 irrelevant (if the pay wall doesn't already). RDA, I think, might also suffer from this problem. -Ross. On Tue, Jul 14, 2009 at 10:35 AM, Walter Lewislew...@hhpl.on.ca wrote: William Wueppelmann wrote: [snip] I'm not entirely sure that TCP/IP and the other IETF RFCs became established because of restrictions placed on OSI. I was under the impression that OSI was also insanely complicated and that the IETF standards were much cheaper to implement from a technical standpoint. And, from a product standpoint, in the mid-90s, there were still a lot of bets being placed on closed online services like AOL, MSN, and Compuserve. Not to mention the book I once saw on MS Blackbird ... (MSN .0001?) which, thankfully, was abandonned before leaving the nest. Any examples closer to the library world? What I had been hoping for were data standards more in the library space. I've read ANSI's Z.39.19 which deals with Monolingual thesauri. (a copy lives here: http://www.slis.kent.edu/~mzeng/Z3919/8Z3919toc.htm) Near as I can tell the parallel multi-lingual standard is ISO 5964 and is available at http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?ics1=01ics2=140ics3=20csnumber=12159 for a fee of 168 Swiss francs (CHF) or ~$155USD I pay attention to the one, and never expect to read the other. This past week I was on the edge of another discussion of standards with associated controlled vocabularies (in the K-12 domain) where a criticism was raised that it wasn't Creative Commons with an Attribution requirement, else how could you teach it? That got me thinking about whether we shouldn't have already learned that lesson because the 'net largely runs on public RFCs, but wondered if I wasn't missing other examples inside our domain. Walter
Re: [CODE4LIB] OpenURL question
Stuart, The short is answer is probably. The longer answer is that, yes, OpenURL is currently the best way to accomplish what you're looking for. That being said, I think your audience may make this a little more complicated and the solutions perhaps more fragile and hacky. Since you don't have an /institutional/ target audience (at least, that's the impression I get), this falls out of the sort of traditional OpenURL workflow. Generally, you have some a priori knowledge of the affiliation of the service user and can point your OpenURLs at that person's institutional link resolver (so they can get context services that are appropriate for them). When you don't have that, the general answer is to use COinS [1]. However, COinS themselves have no way to associate a person at a web browser with an institutional link resolver. Dan Chudnov wrote a couple years ago about using the OCLC Link Resolver Registry to handle this [2]: grab the user's IP address, do a lookup in the background, and rewrite the COinS to use the person's institutional link resolver, if there is a match. The problem is, because it's based on IP, it's quite possible there will not be a match. There's also the reality that lots and lots (perhaps the majority) of people have no access to a link resolver at all - do they just get left in the dark? It's also possible that some (much?) of what you'd be citing would be available in OA archives or Google Book Search or Open Library. This would make the case to also run a 'default' link resolver, such as the Umlaut [3], to find open web things. Conveniently, the Umlaut is engineered(*) to be able to handle the OCLC Resolver Registry and merge an external resolver (or resolvers) into the options. I fear this doesn't sound terribly encouraging, but good luck, -Ross. [1] http://ocoins.info/ [2] http://onebiglibrary.net/story/solving-the-appropriate-resolver-problem [3] http://wiki.code4lib.org/index.php/Umlaut * - Umlaut originally had this functionality, but due to a lack of perceivable need, it's gone a bit to seed. That being said, it's designed to do this and shouldn't be too difficult to reintegrate. On Mon, Jun 29, 2009 at 10:06 PM, stuart yeatesstuart.yea...@vuw.ac.nz wrote: We have an index of place names that we're considering digitising and ingesting into our collection (http://www.nzetc.org/). For each place name a series of bibliographic references (often including page #) list uses of the place name. We want to build a mapping from those bibliographic references to the documents: * some of the doucments are in our electronic collection * some of the documents are in other peoples' electronic collections * some of the documents are not online yet, but may soon be Is OpenURL the right tool for this job? Is there an implementation / configuration that people suggest for this? cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] HTML mark-up in MARC records
On Tue, Jun 23, 2009 at 9:39 AM, Casey Bisson cbis...@plymouth.edu wrote: The mistake here is presuming that (X)HTML coded data isn't (or can't be) data. I think it's a greater mistake to assume that it will be. We are talking about putting semantics into a structured data source (by people who are trained in both the data format in particular and the tools to input data into it), why introduce another data format (out of context, I might add -- we're only talking about a single tag) which is much more likely to be adulterated by human input and the fallacies that come with being human? A semantic XHTML serialization of data is very different than ad-hoc usage of HTML for a local need. -Ross.
Re: [CODE4LIB] FW: [CODE4LIB] openurl.info ?
This was the historical (and annoying) behavior. At one point a redirect was added to http://openurl.info/registry that I suppose needs to be recreated. -Ross. On Sun, May 17, 2009 at 3:50 AM, Boheemen, Peter van peter.vanbohee...@wur.nl wrote: Hmm Roy but should it be pointing to the PURL site ? http://www.openurl.info/ = http://purl.oclc.org/ Peter Drs. P.J.C. van Boheemen Hoofd Applicatieontwikkeling en beheer - Bibliotheek Wageningen UR Head of Application Development and Management - Wageningen University and Research Library tel. +31 317 48 25 17 http://library.wur.nl P Please consider the environment before printing this e-mail -Oorspronkelijk bericht- Van: Code for Libraries namens Roy Tennant Verzonden: zo 17-5-2009 2:03 Aan: CODE4LIB@LISTSERV.ND.EDU Onderwerp: [CODE4LIB] FW: [CODE4LIB] openurl.info ? -- Forwarded Message From: Karen Wetzel kwet...@niso.org Reply-To: A discussion listserv for topics surrounding the Open URL NISO standard Z39.88. open...@oclc.org Date: Fri, 15 May 2009 12:11:01 -0400 To: open...@oclc.org Subject: Re: [CODE4LIB] openurl.info ? Greetings, I just wanted to send a quick follow-up on my last note to confirm that we've worked to fix this error and that the www.openurl.info http://www.openurl.info domain is now working again. If you are still experiencing problems with the URL, please do send me a note and I'll be sure to look into it. Again, I apologize on behalf of NISO for this error, and appreciate all your feedback and patience as we worked to resolve this problem. Truly, Karen -- Karen A. Wetzel Standards Program Manager National Information Standards Organization (NISO) One North Charles Street, Suite 1905 Baltimore, MD 21201 Tel.: 301-654-2512 Fax: 410-685-5278 E-mail: kwet...@niso.org On May 15, 2009, at 11:13 AM, Ray Denenberg, Library of Congress wrote: Yes apparently NISO is (was) the owner. I've sent them a note (and anyone else who feels so inclined should too: Email:nis...@niso.org mailto:Email:nis...@niso.org ). --Ray - Original Message - From: Venicio mailto:vrbu...@gmail.com To: open...@oclc.org Sent: Friday, May 15, 2009 11:05 AM Subject: Re: [CODE4LIB] openurl.info ? FYI: Domain ID:D2132192-LRMS Domain Name:OPENURL.INFO http://OPENURL.INFO Created On:10-May-2002 17:49:32 UTC Last Updated On:15-May-2009 14:56:44 UTC Expiration Date:10-May-2010 17:49:32 UTC Sponsoring Registrar:DSTR Acquisition PA I, LLC d/b/a Domainbank.com (R107-LRMS) Status:OK Registrant ID:C4373421-LRMS Registrant Name:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION Registrant Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION Registrant Street1:4733 Bethesda Ave. Registrant Street2:STE 300 Registrant Street3: Registrant City:Bethesda Registrant State/Province:MD Registrant Postal Code:20814 Registrant Country:US Registrant Phone:+1.3016542512 Registrant Phone Ext.: Registrant FAX: Registrant FAX Ext.: Registrant Email:nis...@niso.org mailto:email%3anis...@niso.org Admin ID:DOT-3Q02W1748WCF Admin Name:Pat Stevens Admin Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION Admin Street1:4733 Bethesda Ave. Admin Street2: Admin Street3: Admin City:Bethesda Admin State/Province:MD Admin Postal Code:20814 Admin Country:BE Admin Phone:+32.3016542512 Admin Phone Ext.: Admin FAX: Admin FAX Ext.: Admin Email:nis...@niso.org mailto:email%3anis...@niso.org Billing ID:DOT-132FHTD2SCKP Billing Name:Patricia Stevens Billing Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION Billing Street1:4733 Bethesda Ave. Billing Street2: Billing Street3: Billing City:Bethesda Billing State/Province:MD Billing Postal Code:20814 Billing Country:BE Billing Phone:+32.3016542512 Billing Phone Ext.: Billing FAX: Billing FAX Ext.: Billing Email:nis...@niso.org mailto:email%3anis...@niso.org Tech ID:DOT-IQIOP5LKRKM0 Tech Name:Pat Stevens Tech Organization:NISO - NATIONAL INFORMATION STANDARD ORGANIZATION Tech Street1:4733 Bethesda Ave. Tech Street2: Tech Street3: Tech City:Bethesda Tech State/Province:MD Tech Postal Code:20814 Tech Country:BE Tech Phone:+32.3016542512 Tech Phone Ext.: Tech FAX: Tech FAX Ext.: Tech Email:nis...@niso.org mailto:email%3anis...@niso.org Name Server:DNS.OCLC.ORG http://DNS.OCLC.ORG Name Server:DNS2.OCLC.ORG http://DNS2.OCLC.ORG Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: Name Server: On Fri, May 15, 2009 at 10:39 AM, Phil Adams p...@dmu.ac.uk wrote: I heard via twitter that: openurl.info http://openurl.info domain name expired on sunday! somebody messed up Regards, Philip Adams Senior Assistant Librarian (Electronic Services Development) De Montfort University Library 0116
[CODE4LIB] Fwd: OPEN POSITION: Linked Data in Digital Libraries
Seems like Linked Data + Library + Vienna might be of interest to somebody here. -Ross. -- Forwarded message -- From: Bernhard Haslhofer bernhard.haslho...@univie.ac.at Date: Fri, May 15, 2009 at 3:46 AM Subject: OPEN POSITION: Linked Data in Digital Libraries To: Linked Data community public-...@w3.org Hello, if somebody feels like moving to Vienna to work in a digital library project where we will definitely do some Linked Data research, please let me know. Best, Bernhard - The Multimedia Information Systems Group (http://www.cs.univie.ac.at/mis) at the University of Vienna / Austria is looking for an excellent candidate to work as a PhD researcher in the EU eContentPlus project EuropeanaConnect. The objective of EuropeanaConnect (http://www.europeanaconnect.eu/) is to deliver core components which are essential for the realization of the European Digital Library (Europeana) as a truly interoperable, multilingual and user-oriented service for all European citizens. The project will provide the technologies and resources to semantically enrich vast amounts of digital content in Europeana. This will enable semantically based content discovery including support for advanced searching and browsing, allowing for delivery of enhanced services and making Europeana content more accessible, reusable and exploitable. We expect the applicant to work in the following areas: - Web-based knowledge organization systems (e.g., Linked Data) - metadata registries - persistent digital object identifiers - multimedia annotations The ideal candidate holds a MS degree in Computer Science or related field and is able to consider both theoretical and practical/implementation aspects in her/his work. Fluent english communication and programming skills are fundamental requirements. Preferably the candidate has a background in one of the following fields: - semantic technologies (RDF, SKOS, etc) - metadata management - multimedia computing The position starts as soon as possible and is full-time (40h/week) for the duration of the project until Oct 2011. Review of applications will begin immediately and will continue until the position is filled. The successful candidate will tightly work with international partners and has the possibility to pursue PhD work within the scope of the project. We invite interested applicants to send their resume including a pointer to their previous / current work and publications to sekretar...@mminf.univie.ac.at, Reference No: 396/MIS/0109. The University of Vienna is an Equal Opportunities Employer. Women therefore are especially encouraged to apply. In case of equal qualification, women applying are to be given priority unless reasons specific to an individual male candidate tilt the balance in his favor according to judgments of the EU Court of Justice. - __ Research Group Multimedia Information Systems Department of Distributed and Multimedia Systems Faculty of Computer Science University of Vienna Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649 E-Mail: bernhard.haslho...@univie.ac.at WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
[CODE4LIB] Diebold-o-tron-o-matic IG
Hi everybody. We're probably 6 months (or less) from the voting season in Code4libya and I want to preemptively counter the catcalls, jeers, the calls for the Drupal voting module, etc. prior to 4 days before the first vote opening. So, if you're interested in participating in this, let me know. If you're interested in /leading/ this, /please/ let me know, because I'm perfectly happy just firing up the Diebold-o-tron-o-matic for another year, so if you've got a real bone to pick with how things work, stand and deliver. Thanks, -Ross.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Tue, May 12, 2009 at 6:21 AM, Jakob Voss jakob.v...@gbv.de wrote: Ross Singer wrote: ?xml version=1.0 encoding=UTF-8? formats xmlns=http://unapi.info/; format name=foaf uri=http://xmlns.com/foaf/0.1// /formats I generally agree with this, but what about formats that aren't XML or RDF based? How do I also say that you can grab my text/x-vcard? Or my application/marc record? There is still lots of data I want that doesn't necessarily have these characteristics. In my blog posting I included a way to specify mime types (such as as text/x-vcard or application/marcURI) as URI. According to RFC 2220 the application/marc type refers to the harmonized USMARC/CANMARC specification whatever this is - so the mime type can be used as format identifier. For vCard there is an RDF namespace and a (not very nice) XML namespace: http://www.w3.org/2001/vcard-rdf/3.0# vcard-temp (see http://xmpp.org/registrar/namespaces.html) This is vCard as RDF, not vCard the format (which is text based). It would be the equivalent of saying, here's an hCard, it's the same thing, right? although the reason I may be requesting a vCard in its native format is because I have a vCard parser or an application that consumes them (Exchange, for example). That depends whether you want to be taken serious outside the library community and target at the web as a whole or not. My point is that there's a step before that, possibly, where the theory behind unAPI, Jangle, whatever, is tested to even see if it's going in the right direction before writing it up formally as an RFC. I don't think the lack of adoption of unAPI has anything to do with the prose of it's specification document. The RFC format is useful for later adopters, but people that, say, jumped on the Atom syndication format as a good idea didn't need an RFC first, they developed a spec, /then/ wrote the standard once they had an idea of how it needed to work. -Ross.
Re: [CODE4LIB] Formats and its identifiers
On Mon, May 11, 2009 at 9:53 AM, Jakob Voss jakob.v...@gbv.de wrote: That's your interpretation. According to the schema, the MODS format *is* either a single mods-element or a modsCollection-element. That's exactely what you can refer to with the namespace identifier http://www.loc.gov/mods/v3. Agreed. The same is true, of course, of MARC and, by extension, MARCXML. Part of the format is that it can be one record or multiple. I don't think this a particularly strong argument against using the namespace as an identifier. The namespace http://www.loc.gov/mods/v3 of the top level element 'mods' does not identify the top level element but the MODS *format* (in any of the versions 3.0-3.4) itself. This format *includes* the top level element 'mods'. I'm not really sure of the changes between MODS v.3.0-3.3 -- are they basically backwards and forwards compatible? I imagine there are a lot of cases where the client doesn't care what point release of MODS the thing is serialized as, just that it's MODS and that it can find generally what it's looking for in that structure, right? -Ross.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects) is achieved it will be necessary to have the translating dictionary and systems which know how to use it. Unification reduces Ray's list of 15 alternative uris to 14 or 13 or whatever. As long as that number is 1 translation will be necessary. (I will leave aside discussions of massive record bloat, continual system re-writes, the politics of whose view prevails, the unhelpfulness of compromises for joint solutions, and so on.) Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Friday, May 01, 2009 02:36 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. I strongly, STRONGLY agree with this. It's exactly what I was about to write myself, in response to Peter's message, until I saw that Jonathan had saved me the trouble :-) Let's solve the problem that's in front of us right now: bring SRU into harmony with OpenURL in this respect, and the very act of doing so will lend extra legitimacy to the agreed-on identifiers, which will then be more strongly positioned as The Right Identifiers for other initiatives to use. _/|_ ___ /o ) \/ Mike Taylor m...@indexdata.com http://www.miketaylor.org.uk )_v__/\ You cannot really appreciate Dilbert unless you've read it in the original Klingon. -- Klingon Programming Mantra
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
I agree that most software probably won't do it. But the data will be there and free and relatively easy to integrate if one wanted to. In a lot ways, Jonathan, it's got Umlaut written all over it. Now to get to Jonathan's point -- yes, I think the primary goal still needs to be working towards bringing use of identifiers for a given thing to a single variant. However, we would obviously have to know what the options are in order to figure out what that one is -- while we're doing that, why not enter the different options into the registry and document them in some way (such as, who uses this variant?). Voila, we have a crosswalk. Of course, the downside is that we technically also have a new URI for this resource (since the skos:Concept would need to have a URI), but we could probably hand wave that away as the id for the registry concept, not the data format. So -- we seem to have some agreement here? -Ross. On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote: From my perspective, all we're talking about is using the same URI to refer to the same format(s) accross the library community standards this community generally can control. That will make things much easier for developers, especially but not only when building software that interacts with more than one of these standards (as client or server). Now, once you've done that, you've ALSO set the stage for that kind of RDF scenario, among other RDF scenarios. I agree with Mike that that particular scenario is unlikely, but once you set the stage for RDF experimentation like that, if folks are interested in experimenting (and many in our community are), maybe something more attractively useful will come out of it. Or maybe not. Either way, you've made things easier and more inter-operable just by using the same set of URIs across multiple standards to refer to the same thing. So, yeah, I'd still focus on that, rather than any kind of 'cross walk', RDF or not. It's the actual use case in front of us, in which the benefit will definitely be worth the effort (if the effort is kept manageable by avoiding trying to solve the entire universe of problems at once). Jonathan Mike Taylor wrote: So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|_ ___ /o ) \/ Mike Taylor m...@indexdata.com http://www.miketaylor.org.uk )_v__/\ Someday, I'll show you around monster-free Tokyo -- dialogue from Gamera: Guardian of the Universe Peter Noerr writes: I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd
Re: [CODE4LIB] registering info: uris?
So hey, I'm nobody wanted to see this thread revived, but I'm hoping you info uri folks can clear something up for me. So I'm trying to gather together a vocabulary of identifiers to unambiguously describe the format of the data you would be getting in a Jangle feed or an UnAPI response (or any other variation on this theme). I have a MODS document and I want *you* to have it too!. Jakob Voss made the (reasonable) suggestion that rather than create yet another identifier or registry to describe these formats, instead it would make sense to use the work that the SRU: http://www.loc.gov/standards/sru/resources/schemas.html or OpenURL: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats communities have already done. Which makes a lot of sense. It would be nice to use the same identifier in Jangle, SRU and OpenURL to say that this is a MARCXML or ONIX record. Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? How do we keep up? Are they reusable? Which one should be used? Doesn't this pretty horribly undermine the purpose of using info URIs in the first place? Is anybody else interested in working on a way to unambiguously say here is a Dublin Core resource as XML, but it is not OAI DC or this is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse among all of our various ways of sharing data? Thanks, -Ross.