Re: [CODE4LIB] rdf serialization
Karen, The URIs you gave get me to webpages *about* the Declaration of Independence. I'm sure it's just a copy/paste mistake, but in this context you want the exact right URIs of course. And by better I guess you meant probably more widely used and probably longer lasting? :) LOC URI for the DoI (the work) is without .html: http://id.loc.gov/authorities/names/n79029194 VIAF URI for the DoI is without trailing /: http://viaf.org/viaf/179420344 Ben http://companjen.name/id/BC - me http://companjen.name/id/BC.html - about me On 05-11-13 19:03, Karen Coyle li...@kcoyle.net wrote: Eric, I found an even better URI for you for the Declaration of Independence: http://id.loc.gov/authorities/names/n79029194.html Now that could be seen as being representative of the name chosen by the LC Name Authority, but the related VIAF record, as per the VIAF definition of itself, represents the real world thing itself. That URI is: http://viaf.org/viaf/179420344/ I noticed that this VIAF URI isn't linked from the Wikipedia page, so I will add that. kc
Re: [CODE4LIB] rdf serialization
On Wed, Nov 6, 2013 at 3:47 AM, Ben Companjen ben.compan...@dans.knaw.nl wrote: The URIs you gave get me to webpages *about* the Declaration of Independence. I'm sure it's just a copy/paste mistake, but in this context you want the exact right URIs of course. And by better I guess you meant probably more widely used and probably longer lasting? :) LOC URI for the DoI (the work) is without .html: http://id.loc.gov/authorities/names/n79029194 VIAF URI for the DoI is without trailing /: http://viaf.org/viaf/179420344 Thanks for that Ben. IMHO it's (yet another) illustration of why the W3C's approach to educating the world about URIs for real world things hasn't quite caught on, while RESTful ones (promoted by the IETF) have. If someone as knowledgeable as Karen can do that, what does it say about our ability as practitioners to use URIs this way, and in our ability to write software to do it as well? In a REST world, when you get a 200 OK it doesn't mean the resource is a Web Document. The resource can be anything, you just happened to successfully get a representation of it. If you like you can provide hints about the nature of the resource in the representation, but the resource itself never goes over the wire, the representation does. It's a subtle but important difference in two ways of looking at Web architecture. If you find yourself interested in making up your own mind about this you can find the RESTful definitions of resource and representation in the IETF HTTP RFCs, most recently as of a few weeks ago in draft [1]. You can find language about Web Documents (or at least its more recent variant, Information Resource) in the W3C's Architecture of the World Wide Web [2]. Obviously I'm biased towards the IETF's position on this. This is just my personal opinion from my experience as a Web developer trying to explain Linked Data to practitioners, looking at the Web we have, and chatting with good friends who weren't afraid to tell me what they thought. //Ed [1] http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-24#page-7 [2] http://www.w3.org/TR/webarch/#id-resources
Re: [CODE4LIB] rdf serialization
Yes, I'm going to get sucked into this vi vs emacs argument for nostalgia's sake... ROTFL, because that is exactly what I was thinking. “Vi is better. No, emacs. You are both wrong; it is all about BBedit!” Each tool whether they be editors, email clients, or RDF serializations all have their own strengths and weaknesses. Like religions, none of them are perfect, but they all have some value. —ELM
[CODE4LIB] Job: Digital Library Applications Developer
** Please excuse any cross-posting ** The Temple University Libraries are seeking a creative and energetic individual to fill the position of Digital Library Applications Developer. Temple’s federated library system serves an urban research university with over 1,800 full-time faculty and a student body of 36,000 that is among the most diverse in the nation. For more information about Temple and Philadelphia, visit http://www.temple.edu. Description Reporting to the Senior Digital Library Applications Developer and working closely with others in the Digital Library Initiatives Department, help develop and maintain the technological infrastructure for Temple University’s digital library initiatives and services, which includes preserving and delivering large collections of digital objects, and supporting digital humanities and scholarly communication initiatives throughout the Library. Under the guidance of supervisor, architect, implement, test and deploy new tools and services primarily based on open source project software, such as Omeka, Fedora Commons, Hydra, and Open Journal Systems (OJS), potentially contributing code to those projects. Perform other duties as assigned. Required Education and Experience * BS in Computer Science or related field, or an equivalent combination of education and experience. Required Skills and Abilities * Demonstrated experience with application development in at least one major programming language such as Ruby on Rails, PHP, or Java * Demonstrated experience with MySQL or other database management systems. * Demonstrated knowledge of the LAMP stack or similar technology stacks. * Demonstrated ability to perform effective code testing. * Experience with project requirements gathering. * Strong organizational and interpersonal skills, demonstrated ability to work in a collaborative team-based environment, and to communicate well with IT and non-IT staff. Commitment to responsive and innovative service. * Demonstrated ability to write clear documentation. Preferred * Experience with a repository system such as Fedora/Hydra, Fedora/Islandora, or Dspace. * Familiarity with a Content Management System like Drupal or an exhibit curation system like Omeka would be a plus. * Experience working with Open Source software; experience with version control, test-driven development, and continuous integration techniques. * Experience with QA testing of web applications. * Experience with Linux/Unix operating systems, including scripting and commands. * Experience working with authentication and authorization protocols, including LDAP. * Knowledge of XML/XSLT. * Familiarity with digital library standards, such as Dublin Core, MARC, METS, EAD, and OAI-PMH. To apply: To apply for this position, please visit http://www.temple.edu/hr/departments/employment/jobs_within.htm, click on Non-Employees Only, and search for job number TU-17222. For full consideration, please submit your completed electronic application, along with a cover letter and resume. Review of applications will begin immediately and will continue until the position is filled. Temple University is an Affirmative Action/Equal Opportunity Employer with a strong commitment to cultural diversity. -- Katherine Lynch, Senior Digital Library Applications Developer Temple University Library (http://library.temple.edu) Samuel L. Paley Library, Room 113, 1210 Polett Walk, Philadelphia, PA 19122 Tel: 215-204-2821 | Fax: 215-204-5201 | Email: katherine.ly...@temple.edu
Re: [CODE4LIB] rdf serialization
Ben, Yes, I copied from the browser URIs, and that was sloppy. However, it was the quickest thing to do, plus it was addressed to a human, not a machine. The URI for the LC entry is there on the page. Unfortunately, the VIAF URI is called Permalink -- which isn't obvious. I guess if I want anyone to answer my emails, I need to post mistakes. When I post correct information, my mail goes unanswered (not even a thanks). So, thanks, guys. kc On 11/6/13 12:47 AM, Ben Companjen wrote: Karen, The URIs you gave get me to webpages *about* the Declaration of Independence. I'm sure it's just a copy/paste mistake, but in this context you want the exact right URIs of course. And by better I guess you meant probably more widely used and probably longer lasting? :) LOC URI for the DoI (the work) is without .html: http://id.loc.gov/authorities/names/n79029194 VIAF URI for the DoI is without trailing /: http://viaf.org/viaf/179420344 Ben http://companjen.name/id/BC - me http://companjen.name/id/BC.html - about me On 05-11-13 19:03, Karen Coyle li...@kcoyle.net wrote: Eric, I found an even better URI for you for the Declaration of Independence: http://id.loc.gov/authorities/names/n79029194.html Now that could be seen as being representative of the name chosen by the LC Name Authority, but the related VIAF record, as per the VIAF definition of itself, represents the real world thing itself. That URI is: http://viaf.org/viaf/179420344/ I noticed that this VIAF URI isn't linked from the Wikipedia page, so I will add that. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] rdf serialization
I could have known it was a test! ;) Thanks Karen :) On 06-11-13 15:20, Karen Coyle li...@kcoyle.net wrote: I guess if I want anyone to answer my emails, I need to post mistakes.
Re: [CODE4LIB] rdf serialization
I wrote about this a few months back at http://blogs.library.duke.edu/dcthree/2013/07/27/the-trouble-with-triples/ I'd be very interested to hear what the smart folks here think! Hugh On Nov 5, 2013, at 18:28 , Alexander Johannesen alexander.johanne...@gmail.com wrote: But the question to every piece of meta data is *authority*, which is the part of RDF that sucks.
[CODE4LIB] databases/indexes with well-structured output
What are some of the more popular and useful bibliographic databases/indexes with well-structured output? If it were easy (trivial) for our readers to gets sets of well-structured data out of our bibliographic databases, then it would be relatively easy for us to write software enabling readers to use and understand — evaluate — their data. What databases/indexes lend themselves to this solution? Let me elaborate. JSTOR’s Data For Research service provides complete access to the totality of JSTOR, sans the articles themselves, unless you are auathorized. [1] A person can search JSTOR and then request a data dump compete with citations, keyword frequencies, and n-grams. This data can then be used to create a report — like a timeline or tag clouds or concordances — illustrating the characteristics of the found set. About six months ago I wrote a program, the beginnings of such a report. [2] Suppose a reader diligently used something like Endnote, Zotero, or RefWorks to save and manage their bibliographic citations of interest. If the reader were to export some or all of their bibliographic data to a file, then the result would be well-structured and computer readable. Things like titles, authors, keywords/subjects, maybe abstracts, and citations would be neatly delimited. If this file were read by a second computer program new views of the data could be manifested. Again, a timeline could be created. Wordclouds could be created. An analysis could be done against the data to determine frequent authors. Relationships between authors might be able to be exposed. All of this would assist the reader in evaluating their found set. Through the use of APIs I can search things like WorldCat, the HathiTrust, or the Internet Archive. The result could be (for better or for worse) MARC records. Again, analysis could be done against this data not to find information (that has already been done), but rather to evaluate the data — look for patterns and anomalies. Put another way, instead of trying to force people to do the best and most perfect bibliographic search, allow them to do broad searches and then provide supplementary tools enabling the reader to examine the results. It is not about find. It is about use understand. I prefer XML to other data structures, but I will not necessarily limit myself to XML. What information sources would you suggest I use? Here is a short, unordered list: * JSTOR Data For Research Data * Zotero (RDF) XML output * WorldCat, HathiTrust, Internet Archive After I write the “search results evaluation tool”, I will then go to the next step and provide tools for the “distant reading” of individual items á la my PDF2TXT application. [3] We here in libraries can no longer just give people access to information because people have more access than they know what to do with. Instead, I think an opportunity exists for us to provide tools for evaluating the information they have so they can use understand it. Call it “scalable, computer-supplemented information literacy”. [1] Data For Research - http://dfr.jstor.org [2] JSTOR Tool — http://dh.crc.nd.edu/sandbox/jstor-tool/ [3] PDF2TXT - http://dh.crc.nd.edu/sandbox/pdf2txt.cgi — Eric Morgan University of Notre Dame
[CODE4LIB] Free LITA Post-Conference Tutorial on Forthcoming NISO ResourceSync Standard
FYI. Begin forwarded message: From: Cynthia Hodgson chodg...@niso.orgmailto:chodg...@niso.org Subject: [lita-l] Free LITA Post-Conference Tutorial on Forthcoming NISO ResourceSync Standard Date: November 6, 2013 at 9:26:30 AM EST To: LITA-L lit...@ala.orgmailto:lit...@ala.org, lita-st...@ala.orgmailto:lita-st...@ala.org lita-st...@ala.orgmailto:lita-st...@ala.org Reply-To: chodg...@niso.orgmailto:chodg...@niso.org chodg...@niso.orgmailto:chodg...@niso.org Participants at the 2013 LITA Forum in Louisville are invited to stay a few hours longer on Sunday, November 10 to attend the ResourceSync Tutorialhttp://www.ala.org/lita/conferences/forum/2013/postcon, which will be held after the close of the main conference from 1:30-4:30 p.m. Herbert van de Sompelhttp://public.lanl.gov/herbertv/home/, Co-chair of the ResourceSync Working Group, will lead this 3-hour session where attendees can learn about how the forthcoming ResourceSync standardx-msg://24/ResourceSync%20standard can be used to synchronize web resources between servers. ResourceSync, begun in late 2011, is a joint project between NISO and the Open Archives Initiative (OAI) team, with funding from the Sloan Foundation. The standard, currently in final editing for approval, describes a synchronization framework for the web consisting of various capabilities that allow third-party systems to remain synchronized with a server's evolving resources. The capabilities can be combined in a modular manner to meet local or community requirements. This specification also describes how a server can advertise the synchronization capabilities it supports and how third-party systems can discover this information. The specification repurposes the document formats defined by the Sitemap protocol and introduces extensions for them. This LITA post-conference tutorial is available at no cost. As we would appreciate knowing how many people are coming, please select the post conference checkbox on the registration formhttp://www.ala.org/lita/conferences/forum/2013/registration. You can also view the beta version of the specification http://www.openarchives.org/rs/0.9.1/toc and provide feedback on the ResourceSync Google Grouphttps://groups.google.com/d/forum/resourcesync. Visit the ResourceSync workroom webpagehttp://www.niso.org/workrooms/resourcesync/ for more information about the project.http://www.niso.org/workrooms/resourcesync/ Cynthia Hodgson Technical Editor / Consultant National Information Standards Organization chodg...@niso.orgmailto:chodg...@niso.org 301-654-2512 -- Peter Murray Assistant Director, Technology Services Development LYRASIS peter.mur...@lyrasis.orgmailto:peter.mur...@lyrasis.org +1 678-235-2955 800.999.8558 x2955
[CODE4LIB] HathiTrust Bib Api - JSONP
Does anyone have a working example of getting jsonp from the HathiTrust bib API? I can get straight json (it seems to ignore the callback parameter) http://catalog.hathitrust.org/api/volumes/brief/oclc/3967141.jsoncallback=mycallbackfunction or jsonp with some unfortunate notices at the top (and yes, I just emailed their 'feedback' address and asked about this.) http://catalog.hathitrust.org/api/volumes/json/oclc:3967141callback=mycallbackfunction I'm wondering if I'm just missing the correct url/syntax.
Re: [CODE4LIB] rdf serialization
In the kinds of data I have to deal with, who made an assertion, or what sources provide evidence for a statement are vitally important bits of information, so its not just a data-source integration problem, where you're taking batches of triples from different sources and putting them together. It's a question of how to encode scholarly, messy, humanities data. The answer of course, might be don't use RDF for that :-). I'd rather not invent something if I don't have to though. Hugh On Nov 6, 2013, at 10:56 , Robert Sanderson azarot...@gmail.com wrote: A large number of triples that all have different provenance? I'm curious as to how you get them :) Rob On Wed, Nov 6, 2013 at 8:52 AM, Hugh Cayless philomou...@gmail.com wrote: Does that work right down to the level of the individual triple though? If a large percentage of my triples are each in their own individual graphs, won't that be chaos? I really don't know the answer, it's not a rhetorical question! Hugh On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com wrote: Named Graphs are the way to solve the issue you bring up in that post, in my opinion. You mint an identifier for the graph, and associate the provenance and other information with that. This then gets ingested as the 4th URI into a quad store, so you don't lose the provenance information. In JSON-LD: { @id : uri-for-graph, dcterms:creator : uri-for-hugh, @graph : [ // ... triples go here ... ] } Rob On Wed, Nov 6, 2013 at 7:42 AM, Hugh Cayless philomou...@gmail.com wrote: I wrote about this a few months back at http://blogs.library.duke.edu/dcthree/2013/07/27/the-trouble-with-triples/ I'd be very interested to hear what the smart folks here think! Hugh On Nov 5, 2013, at 18:28 , Alexander Johannesen alexander.johanne...@gmail.com wrote: But the question to every piece of meta data is *authority*, which is the part of RDF that sucks.
Re: [CODE4LIB] rdf serialization
Hugh, I don't think you're in the weeds with your question (and, while I think that named graphs can provide a solution to your particular problem, that doesn't necessarily mean that it doesn't raise more questions or potentially more frustrations down the line - like any new power, it can be used for good or evil and the difference might not be obvious at first). My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application? We don't generally use XML databases (Marklogic being a notable exception), or MARC databases, or insert your transmission format of choice-specific databases because usually transmission formats are designed to account for lots and lots of variations and maximum flexibility, which generally is the opposite of the modeling that goes into a specific app. I think there's a world of difference between modeling your data so it can be represented in RDF (and, possibly, available via SPARQL, but I think there is *far* less value there) and committing to RDF all the way down. RDF is a generalization so multiple parties can agree on what data means, but I would have a hard time swallowing the argument that domain-specific data must be RDF-native. -Ross. On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless philomou...@gmail.com wrote: Does that work right down to the level of the individual triple though? If a large percentage of my triples are each in their own individual graphs, won't that be chaos? I really don't know the answer, it's not a rhetorical question! Hugh On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com wrote: Named Graphs are the way to solve the issue you bring up in that post, in my opinion. You mint an identifier for the graph, and associate the provenance and other information with that. This then gets ingested as the 4th URI into a quad store, so you don't lose the provenance information. In JSON-LD: { @id : uri-for-graph, dcterms:creator : uri-for-hugh, @graph : [ // ... triples go here ... ] } Rob On Wed, Nov 6, 2013 at 7:42 AM, Hugh Cayless philomou...@gmail.com wrote: I wrote about this a few months back at http://blogs.library.duke.edu/dcthree/2013/07/27/the-trouble-with-triples/ I'd be very interested to hear what the smart folks here think! Hugh On Nov 5, 2013, at 18:28 , Alexander Johannesen alexander.johanne...@gmail.com wrote: But the question to every piece of meta data is *authority*, which is the part of RDF that sucks.
Re: [CODE4LIB] rdf serialization
Ross, I agree with your statement that data doesn't have to be RDF all the way down, etc. But I'd like to hear more about why you think SPARQL availability has less value, and if you see an alternative to SPARQL for querying. kc On 11/6/13 8:11 AM, Ross Singer wrote: Hugh, I don't think you're in the weeds with your question (and, while I think that named graphs can provide a solution to your particular problem, that doesn't necessarily mean that it doesn't raise more questions or potentially more frustrations down the line - like any new power, it can be used for good or evil and the difference might not be obvious at first). My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application? We don't generally use XML databases (Marklogic being a notable exception), or MARC databases, or insert your transmission format of choice-specific databases because usually transmission formats are designed to account for lots and lots of variations and maximum flexibility, which generally is the opposite of the modeling that goes into a specific app. I think there's a world of difference between modeling your data so it can be represented in RDF (and, possibly, available via SPARQL, but I think there is *far* less value there) and committing to RDF all the way down. RDF is a generalization so multiple parties can agree on what data means, but I would have a hard time swallowing the argument that domain-specific data must be RDF-native. -Ross. On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless philomou...@gmail.com wrote: Does that work right down to the level of the individual triple though? If a large percentage of my triples are each in their own individual graphs, won't that be chaos? I really don't know the answer, it's not a rhetorical question! Hugh On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com wrote: Named Graphs are the way to solve the issue you bring up in that post, in my opinion. You mint an identifier for the graph, and associate the provenance and other information with that. This then gets ingested as the 4th URI into a quad store, so you don't lose the provenance information. In JSON-LD: { @id : uri-for-graph, dcterms:creator : uri-for-hugh, @graph : [ // ... triples go here ... ] } Rob On Wed, Nov 6, 2013 at 7:42 AM, Hugh Cayless philomou...@gmail.com wrote: I wrote about this a few months back at http://blogs.library.duke.edu/dcthree/2013/07/27/the-trouble-with-triples/ I'd be very interested to hear what the smart folks here think! Hugh On Nov 5, 2013, at 18:28 , Alexander Johannesen alexander.johanne...@gmail.com wrote: But the question to every piece of meta data is *authority*, which is the part of RDF that sucks. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] rdf serialization
The answer is purely because the RDF data model and the technology around it looks like it would almost do what we need it to. I do not, and cannot, assume a closed world. The open world assumption is one of the attractive things about RDF, in fact :-) Hugh On Nov 6, 2013, at 11:11 , Ross Singer rossfsin...@gmail.com wrote: My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application?
Re: [CODE4LIB] rdf serialization
Hey Karen, It's purely anecdotal (albeit anecdotes borne from working at a company that offered, and has since abandoned, a sparql-based triple store service), but I just don't see the interest in arbitrary SPARQL queries against remote datasets that I do against linking to (and grabbing) known items. I think there are multiple reasons for this: 1) Unless you're already familiar with the dataset behind the SPARQL endpoint, where do you even start with constructing useful queries? 2) SPARQL as a query language is a combination of being too powerful and completely useless in practice: query timeouts are commonplace, endpoints don't support all of 1.1, etc. And, going back to point #1, it's hard to know how to optimize your queries unless you are already pretty familiar with the data 3) SPARQL is a flawed API interface from the get-go (IMHO) for the same reason we don't offer a public SQL interface to our RDBMSes Which isn't to say it doesn't have its uses or applications. I just think that in most cases domain/service-specific APIs (be they RESTful, based on the Linked Data API [0], whatever) will likely be favored over generic SPARQL endpoints. Are n+1 different APIs ideal? I am pretty sure the answer is no, but that's the future I foresee, personally. -Ross. 0. https://code.google.com/p/linked-data-api/wiki/Specification On Wed, Nov 6, 2013 at 11:28 AM, Karen Coyle li...@kcoyle.net wrote: Ross, I agree with your statement that data doesn't have to be RDF all the way down, etc. But I'd like to hear more about why you think SPARQL availability has less value, and if you see an alternative to SPARQL for querying. kc On 11/6/13 8:11 AM, Ross Singer wrote: Hugh, I don't think you're in the weeds with your question (and, while I think that named graphs can provide a solution to your particular problem, that doesn't necessarily mean that it doesn't raise more questions or potentially more frustrations down the line - like any new power, it can be used for good or evil and the difference might not be obvious at first). My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application? We don't generally use XML databases (Marklogic being a notable exception), or MARC databases, or insert your transmission format of choice-specific databases because usually transmission formats are designed to account for lots and lots of variations and maximum flexibility, which generally is the opposite of the modeling that goes into a specific app. I think there's a world of difference between modeling your data so it can be represented in RDF (and, possibly, available via SPARQL, but I think there is *far* less value there) and committing to RDF all the way down. RDF is a generalization so multiple parties can agree on what data means, but I would have a hard time swallowing the argument that domain-specific data must be RDF-native. -Ross. On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless philomou...@gmail.com wrote: Does that work right down to the level of the individual triple though? If a large percentage of my triples are each in their own individual graphs, won't that be chaos? I really don't know the answer, it's not a rhetorical question! Hugh On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com wrote: Named Graphs are the way to solve the issue you bring up in that post, in my opinion. You mint an identifier for the graph, and associate the provenance and other information with that. This then gets ingested as the 4th URI into a quad store, so you don't lose the provenance information. In JSON-LD: { @id : uri-for-graph, dcterms:creator : uri-for-hugh, @graph : [ // ... triples go here ... ] } Rob On Wed, Nov 6, 2013 at 7:42 AM, Hugh Cayless philomou...@gmail.com wrote: I wrote about this a few months back at http://blogs.library.duke.edu/dcthree/2013/07/27/the- trouble-with-triples/ I'd be very interested to hear what the smart folks here think! Hugh On Nov 5, 2013, at 18:28 , Alexander Johannesen alexander.johanne...@gmail.com wrote: But the question to every piece of meta data is *authority*, which is the part of RDF that sucks. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] rdf serialization
Hugh, I'm skeptical of this in a usable application or interface. Applications have constraints. There are predicates you care about, there are values you display in specific ways. There are expectations, based on the domain, in the data that are either driven by the interface or the needs of the consumers. I have yet to see an example of arbitrary and unexpected data exposed in an application that people actually use. -Ross. On Wed, Nov 6, 2013 at 11:39 AM, Hugh Cayless philomou...@gmail.com wrote: The answer is purely because the RDF data model and the technology around it looks like it would almost do what we need it to. I do not, and cannot, assume a closed world. The open world assumption is one of the attractive things about RDF, in fact :-) Hugh On Nov 6, 2013, at 11:11 , Ross Singer rossfsin...@gmail.com wrote: My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application?
Re: [CODE4LIB] rdf serialization
I think that the answer to #1 is that if you want or expect people to use your endpoint that you should document how it works: the ontologies, the models, and a variety of example SPARQL queries, ranging from simple to complex. The British Museum's SPARQL endpoint ( http://collection.britishmuseum.org/sparql) is highly touted, but how many people actually use it? I understand your point about SPARQL being too complicated for an API interface, but the best examples of services built on SPARQL are probably the ones you don't even realize are built on SPARQL (e.g., http://numismatics.org/ocre/id/ric.1%282%29.aug.4A#mapTab). So on one hand, perhaps only the most dedicated and hardcore researchers will venture to construct SPARQL queries for your endpoint, but on the other, you can build some pretty visualizations based on SPARQL queries conducted in the background from the user's interaction with a simple html/javascript based interface. Ethan On Wed, Nov 6, 2013 at 11:54 AM, Ross Singer rossfsin...@gmail.com wrote: Hey Karen, It's purely anecdotal (albeit anecdotes borne from working at a company that offered, and has since abandoned, a sparql-based triple store service), but I just don't see the interest in arbitrary SPARQL queries against remote datasets that I do against linking to (and grabbing) known items. I think there are multiple reasons for this: 1) Unless you're already familiar with the dataset behind the SPARQL endpoint, where do you even start with constructing useful queries? 2) SPARQL as a query language is a combination of being too powerful and completely useless in practice: query timeouts are commonplace, endpoints don't support all of 1.1, etc. And, going back to point #1, it's hard to know how to optimize your queries unless you are already pretty familiar with the data 3) SPARQL is a flawed API interface from the get-go (IMHO) for the same reason we don't offer a public SQL interface to our RDBMSes Which isn't to say it doesn't have its uses or applications. I just think that in most cases domain/service-specific APIs (be they RESTful, based on the Linked Data API [0], whatever) will likely be favored over generic SPARQL endpoints. Are n+1 different APIs ideal? I am pretty sure the answer is no, but that's the future I foresee, personally. -Ross. 0. https://code.google.com/p/linked-data-api/wiki/Specification On Wed, Nov 6, 2013 at 11:28 AM, Karen Coyle li...@kcoyle.net wrote: Ross, I agree with your statement that data doesn't have to be RDF all the way down, etc. But I'd like to hear more about why you think SPARQL availability has less value, and if you see an alternative to SPARQL for querying. kc On 11/6/13 8:11 AM, Ross Singer wrote: Hugh, I don't think you're in the weeds with your question (and, while I think that named graphs can provide a solution to your particular problem, that doesn't necessarily mean that it doesn't raise more questions or potentially more frustrations down the line - like any new power, it can be used for good or evil and the difference might not be obvious at first). My question for you, however, is why are you using a triple store for this? That is, why bother with the broad and general model in what I assume is a closed world assumption in your application? We don't generally use XML databases (Marklogic being a notable exception), or MARC databases, or insert your transmission format of choice-specific databases because usually transmission formats are designed to account for lots and lots of variations and maximum flexibility, which generally is the opposite of the modeling that goes into a specific app. I think there's a world of difference between modeling your data so it can be represented in RDF (and, possibly, available via SPARQL, but I think there is *far* less value there) and committing to RDF all the way down. RDF is a generalization so multiple parties can agree on what data means, but I would have a hard time swallowing the argument that domain-specific data must be RDF-native. -Ross. On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless philomou...@gmail.com wrote: Does that work right down to the level of the individual triple though? If a large percentage of my triples are each in their own individual graphs, won't that be chaos? I really don't know the answer, it's not a rhetorical question! Hugh On Nov 6, 2013, at 10:40 , Robert Sanderson azarot...@gmail.com wrote: Named Graphs are the way to solve the issue you bring up in that post, in my opinion. You mint an identifier for the graph, and associate the provenance and other information with that. This then gets ingested as the 4th URI into a quad store, so you don't lose the provenance information. In JSON-LD: { @id : uri-for-graph, dcterms:creator : uri-for-hugh, @graph : [ // ...
[CODE4LIB] Job: Associate University Librarian for Library Information Technology, University of Michigan at University of Michigan
Associate University Librarian for Library Information Technology, University of Michigan University of Michigan Ann Arbor The **University of Michigan Library** is transforming the way libraries organize, preserve, and share access to knowledge in service of the mission of one of the world's leading research universities. We seek a forward-thinking, collaborative, mission-driven, and innovative Associate University Librarian (AUL) to join the library's leadership team, reporting to the Dean of Libraries. **Associate University Librarian for Library Information Technology (LIT)** The AUL for LIT will lead the development of information technology in support of the university's current and emerging research needs, and the advancement of scholarly literacy and instructional technologies. To direct the development, management, and maintenance of a flexible and reliable technology environment, the AUL for LIT will lead 60 talented staff members in six units: Core Services, Digital Library Production Services, Learning Technology Incubation Group, Library Systems, User Experience, and Web Systems. The AUL for LIT must possess the technical and conceptual knowledge to represent the library in broad conversations about IT, and advance the campus-wide development of emerging instructional technologies as well as systems to enable emerging research needs, including the management and preservation of data. We are searching for professionals with a deep understanding of the myriad and changing roles of the library, who view publishing and information technology as integral to our mission, and who can excel within the context of a world- class research university. Because we are committed to diversity, we ask our leaders to develop and nurture the individual and collective skills to recognize, celebrate, and deploy difference as a path to engagement, innovation, and the generation of new ideas. More information is available at: [http://tinyurl.com/UMLib-AUL-LIT](http://tinyurl.com/UMLib-AUL-LIT). Submit nominations or questions to: aulsea...@umich.edu. Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10610/
[CODE4LIB] catqc / marclib
I posted our shelf-ready record analyzer and a small C library (on which it depends) on sourceforge. If someone could build and test the utility in a non Windows environment I would greatly appreciate it. If anyone is interested in using it or has any questions let me know. https://sourceforge.net/projects/marclib https://sourceforge.net/projects/catqc mj Michael Jay, Library IT Suite 1250 2046 Waldo Road Gainesville, FL 32609 352.273.2678 em...@ufl.edu
[CODE4LIB] How to generate a Word document which displays full text links in the output
For those of you who do literature searches for patrons, here is a custom EndNote style that can generate a Word document which displays full text links in the output. https://dl.dropboxusercontent.com/u/2014679/customlinktodoi.ens To make this work, customize the style so that it follows your local institution's OpenURL syntax, and, of course, be sure to get bibliographic records from authoritative sources like MEDLINE or Web of Knowledge. (Those are the only two I've tried this out on so far.) If anyone has ideas for improving this further, please let me know, and I'll update the file. thanks, Paul Paul Albert Project Manager, VIVO Weill Cornell Medical Library 646.962.2551
[CODE4LIB] Canadian WordPress Hosting
Hi Everyone, Apologies for cross-posting, but code4lib is much more active, and has more Canadians that I've seen. I was wondering if anyone had recommendations for a WordPress hosting solution? And yes, it needs to be in Canada. I can do most of my own dev-type work, so really it just needs to be setup to run WordPress (preferably with 1-click install), and most of all, reliable, hopefully with good customer service for when we need to contact the company. Okay, also preferable is that they do daily backups for us and has excellent security (considering it's WordPress). Too many hosting solutions include email and a bunch of other stuff, and I need it only for WordPress and nothing else. A name, plus at least 1-2 reasons on the recommendation would be great! Thanks in advance, Cynthia
[CODE4LIB] Citing source code in high-profile academic journals
Hello, I need some advice about referencing source code in an academic journal. I rarely see it happen and I don’t know why. Background: I’m building a website that connects academic researchers with software developers interested in helping scientists write code. My goal is for these researchers to be able to reference any new source code in the articles they publish -- much like a “gene accession number” or a “PDB code”. Unfortunately, I don’t see any code repositories referenced in high profile journals like Science or PNAS. I’m guessing it’s because the code in the repositories isn’t permanent and may be deleted anytime? Or perhaps a DOI needs to be assigned? So my question to the group is: What criteria is necessary for a code repository or database to be eligible for referencing in scientific academic journals? Some ideas I have based on looking at the Protein Databank and Genbank are: 1) The entry is permanent -- we can’t delete articles once they’ve been published, same is true for entries in the PDB and Genbank 2) The entry gives credit to all authors and contributors 3) The entry has a DOI 4) The entry has a simple accession number - PDB is a four character code, Genebank number is six characters. Is there anything I’m missing? Any advice would be greatly appreciated. Thank you Heather Claxton-Douglas, PhD www.sciencesolved.com http://igg.me/at/ScienceSolved
Re: [CODE4LIB] more suggestions for code4lib.org
Hi Kevin, Thank you for the suggestions. a) is done. (looks like someone already changed the links on the About page). c) I'm torn on. I understand what you mean, but this list or IRC (or even Twitter) might be better. I don't know of a way to have a message go to all people with admin rights on Drupal. Ryan Wick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Hawkins Sent: Monday, November 04, 2013 8:31 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] more suggestions for code4lib.org While we're making suggestions for improving the infrastructure of code4lib.org, here are some things I'd like to see improved: a) Change the email link in the navbar (and in the text at http://code4lib.org/about ) from https://listserv.nd.edu/cgi-bin/wa?SUBED1=CODE4LIBA=1 to https://listserv.nd.edu/cgi-bin/wa?A0=CODE4LIB so that people can easily find the list archives and poke around recent messages before deciding whether to join. b) Modify whatever code sends formatted job postings to this list so that it includes the location of the position. c) Add a contact link so people have a clear place to go to reportadministrivia like point (a) above or broken links. It might go to whichever users have admin privileges on the Drupal instance behind code4lib.org. Thanks for your consideration, Kevin
Re: [CODE4LIB] more suggestions for code4lib.org
For C, directing people to the list would be best, but you could point the email to a gmail box and setup forward rules. Riley Childs Library Director and IT Admin Junior Charlotte United Christian Academy P: 704-497-2086 (Anytime) P: 704-537-0331 x101 (M-F 7:30am-3pm ET) Sent from my iPhone Please excuse mistakes On Nov 6, 2013, at 8:05 PM, Wick, Ryan ryan.w...@oregonstate.edu wrote: Hi Kevin, Thank you for the suggestions. a) is done. (looks like someone already changed the links on the About page). c) I'm torn on. I understand what you mean, but this list or IRC (or even Twitter) might be better. I don't know of a way to have a message go to all people with admin rights on Drupal. Ryan Wick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Hawkins Sent: Monday, November 04, 2013 8:31 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] more suggestions for code4lib.org While we're making suggestions for improving the infrastructure of code4lib.org, here are some things I'd like to see improved: a) Change the email link in the navbar (and in the text at http://code4lib.org/about ) from https://listserv.nd.edu/cgi-bin/wa?SUBED1=CODE4LIBA=1 to https://listserv.nd.edu/cgi-bin/wa?A0=CODE4LIB so that people can easily find the list archives and poke around recent messages before deciding whether to join. b) Modify whatever code sends formatted job postings to this list so that it includes the location of the position. c) Add a contact link so people have a clear place to go to reportadministrivia like point (a) above or broken links. It might go to whichever users have admin privileges on the Drupal instance behind code4lib.org. Thanks for your consideration, Kevin
Re: [CODE4LIB] more suggestions for code4lib.org
On Mon, Nov 4, 2013 at 11:31 PM, Kevin Hawkins kevin.s.hawk...@ultraslavonic.info wrote: b) Modify whatever code sends formatted job postings to this list so that it includes the location of the position. That would be shortimer, and I think it should be doing what you suggest now? https://github.com/code4lib/shortimer/commit/acb57090d4842920c9f92c684810f3c618f0a21e If not let me know, create a github issue, or send a pull request :-) //Ed
Re: [CODE4LIB] We should use HTTPS on code4lib.org
It sounds like we are willing to throw security under the bus for an edge case, although I am sure that I am missing some subtlety Cary On Nov 5, 2013, at 10:27 AM, Ross Singer rossfsin...@gmail.com wrote: On Tue, Nov 5, 2013 at 12:07 PM, William Denton w...@pobox.com wrote: (Question: Why does HTTPS complicate screen-scraping? Every decent tool and library supports HTTPS, doesn't it?) Birkin asked me this same question, and I realized I should clarify what I meant. I was mostly referring to existing screen scrapers/existing web sites. If you redirect every request from http to https, this will probably break things. I think the Open Library example that Karen mentioned is a good case study. And it's pretty different for a library or tool to support HTTPS and a specific app to be expecting it. If you follow the thread around that OL change, it appears there are issues with Java (as one example) arbitrarily consuming HTTPS (from what I understand, you need to have the cert locally?), but I don't know enough about it to say for certain. I think there would also probably be potential issues around mashups (AJAX, for example), but seeing as code4lib.org doesn't support CORS, not really a current issue. Does apply more generally to your question about library websites at large, though. Anyway, I agree with you that the option for both should be there. I'm not just not convinced that HTTPS-all-the-time is necessary for all web use cases. -Ross.
Re: [CODE4LIB] We should use HTTPS on code4lib.org
Why? HTTPS is used when there is sensitive data involved, code4lib.org (at least to my knowledge) does not have sensitive data? Riley Childs Library Director and IT Admin Junior Charlotte United Christian Academy P: 704-497-2086 (Anytime) P: 704-537-0331 x101 (M-F 7:30am-3pm ET) Sent from my iPhone Please excuse mistakes On Nov 6, 2013, at 8:28 PM, Cary Gordon listu...@chillco.com wrote: It sounds like we are willing to throw security under the bus for an edge case, although I am sure that I am missing some subtlety Cary On Nov 5, 2013, at 10:27 AM, Ross Singer rossfsin...@gmail.com wrote: On Tue, Nov 5, 2013 at 12:07 PM, William Denton w...@pobox.com wrote: (Question: Why does HTTPS complicate screen-scraping? Every decent tool and library supports HTTPS, doesn't it?) Birkin asked me this same question, and I realized I should clarify what I meant. I was mostly referring to existing screen scrapers/existing web sites. If you redirect every request from http to https, this will probably break things. I think the Open Library example that Karen mentioned is a good case study. And it's pretty different for a library or tool to support HTTPS and a specific app to be expecting it. If you follow the thread around that OL change, it appears there are issues with Java (as one example) arbitrarily consuming HTTPS (from what I understand, you need to have the cert locally?), but I don't know enough about it to say for certain. I think there would also probably be potential issues around mashups (AJAX, for example), but seeing as code4lib.org doesn't support CORS, not really a current issue. Does apply more generally to your question about library websites at large, though. Anyway, I agree with you that the option for both should be there. I'm not just not convinced that HTTPS-all-the-time is necessary for all web use cases. -Ross.
Re: [CODE4LIB] We should use HTTPS on code4lib.org
SSL certs are expensive because of the administrative work associated with it. Riley Childs Library Director and IT Admin Junior Charlotte United Christian Academy P: 704-497-2086 (Anytime) P: 704-537-0331 x101 (M-F 7:30am-3pm ET) Sent from my iPhone Please excuse mistakes On Nov 6, 2013, at 8:28 PM, Cary Gordon listu...@chillco.com wrote: It sounds like we are willing to throw security under the bus for an edge case, although I am sure that I am missing some subtlety Cary On Nov 5, 2013, at 10:27 AM, Ross Singer rossfsin...@gmail.com wrote: On Tue, Nov 5, 2013 at 12:07 PM, William Denton w...@pobox.com wrote: (Question: Why does HTTPS complicate screen-scraping? Every decent tool and library supports HTTPS, doesn't it?) Birkin asked me this same question, and I realized I should clarify what I meant. I was mostly referring to existing screen scrapers/existing web sites. If you redirect every request from http to https, this will probably break things. I think the Open Library example that Karen mentioned is a good case study. And it's pretty different for a library or tool to support HTTPS and a specific app to be expecting it. If you follow the thread around that OL change, it appears there are issues with Java (as one example) arbitrarily consuming HTTPS (from what I understand, you need to have the cert locally?), but I don't know enough about it to say for certain. I think there would also probably be potential issues around mashups (AJAX, for example), but seeing as code4lib.org doesn't support CORS, not really a current issue. Does apply more generally to your question about library websites at large, though. Anyway, I agree with you that the option for both should be there. I'm not just not convinced that HTTPS-all-the-time is necessary for all web use cases. -Ross.
Re: [CODE4LIB] We should use HTTPS on code4lib.org
How is security getting thrown under the bus? -Ross. On Wednesday, November 6, 2013, Cary Gordon wrote: It sounds like we are willing to throw security under the bus for an edge case, although I am sure that I am missing some subtlety Cary On Nov 5, 2013, at 10:27 AM, Ross Singer rossfsin...@gmail.comjavascript:; wrote: On Tue, Nov 5, 2013 at 12:07 PM, William Denton w...@pobox.comjavascript:; wrote: (Question: Why does HTTPS complicate screen-scraping? Every decent tool and library supports HTTPS, doesn't it?) Birkin asked me this same question, and I realized I should clarify what I meant. I was mostly referring to existing screen scrapers/existing web sites. If you redirect every request from http to https, this will probably break things. I think the Open Library example that Karen mentioned is a good case study. And it's pretty different for a library or tool to support HTTPS and a specific app to be expecting it. If you follow the thread around that OL change, it appears there are issues with Java (as one example) arbitrarily consuming HTTPS (from what I understand, you need to have the cert locally?), but I don't know enough about it to say for certain. I think there would also probably be potential issues around mashups (AJAX, for example), but seeing as code4lib.org doesn't support CORS, not really a current issue. Does apply more generally to your question about library websites at large, though. Anyway, I agree with you that the option for both should be there. I'm not just not convinced that HTTPS-all-the-time is necessary for all web use cases. -Ross.
Re: [CODE4LIB] We should use HTTPS on code4lib.org
I guess I just don't see why http and https can't coexist. -Ross. On Nov 6, 2013 9:39 PM, Cary Gordon listu...@chillco.com wrote: This conversation is heading into the draining the swamp category. Bill Denton started this thread with the suggestion that we use HTTPS everywhere. He did not make a specific case for it. I am just guessing that an argument for going that route would include security. Regardless of whether this is a good idea, or whether there is a compelling reason for doing it, it seems to me that the possibility of its making it difficult for older scraping tools to scrape the site does not seem like a compelling reason not to do it. The cost issue, on the other hand, would be a more compelling consideration. Thanks, Cary On Nov 6, 2013, at 6:17 PM, Ross Singer rossfsin...@gmail.com wrote: How is security getting thrown under the bus? -Ross. On Wednesday, November 6, 2013, Cary Gordon wrote: It sounds like we are willing to throw security under the bus for an edge case, although I am sure that I am missing some subtlety Cary On Nov 5, 2013, at 10:27 AM, Ross Singer rossfsin...@gmail.com javascript:; wrote: On Tue, Nov 5, 2013 at 12:07 PM, William Denton w...@pobox.com javascript:; wrote: (Question: Why does HTTPS complicate screen-scraping? Every decent tool and library supports HTTPS, doesn't it?) Birkin asked me this same question, and I realized I should clarify what I meant. I was mostly referring to existing screen scrapers/existing web sites. If you redirect every request from http to https, this will probably break things. I think the Open Library example that Karen mentioned is a good case study. And it's pretty different for a library or tool to support HTTPS and a specific app to be expecting it. If you follow the thread around that OL change, it appears there are issues with Java (as one example) arbitrarily consuming HTTPS (from what I understand, you need to have the cert locally?), but I don't know enough about it to say for certain. I think there would also probably be potential issues around mashups (AJAX, for example), but seeing as code4lib.org doesn't support CORS, not really a current issue. Does apply more generally to your question about library websites at large, though. Anyway, I agree with you that the option for both should be there. I'm not just not convinced that HTTPS-all-the-time is necessary for all web use cases. -Ross.
Re: [CODE4LIB] We should use HTTPS on code4lib.org
On Wed, Nov 6, 2013 at 8:49 PM, Ross Singer rossfsin...@gmail.com wrote: I guess I just don't see why http and https can't coexist. They can definitely coexist, but there is a corresponding maintenance cost and a slightly higher risk profile (e.g. session hijacking is still possible in a variety of mixed http/https configurations). I noticed a a pretty good, if a bit dated, run-down of the tradeoffs for various secure setups in Drupal http://drupalscout.com/knowledge-base/drupal-and-ssl-multiple-recipes-possible-solutions-https. Even if the solutions have somewhat changed, it does get at the idea of what some of the tradeoffs are between security, usability and maintenance. Just today, I noticed a security alert (https://drupal.org/node/2129381) for the Drupal 6 Secure Pages module where theoretically secured pages and forms could be transmitted in the clear. This is the module you'd most likely use to achieve a mixed http/https site in Drupal. I have personally tended to just put everything behind https because of the added work/modules/maintenance associated to running it along side of http (in Drupal, specifically), but I am a lazy person with access to free certs and ferncer servers. HTH -- Chad Fennell Web Developer University of Minnesota Libraries (612) 626-4186