Re: [CODE4LIB] DOI scraping

Fitchett, Deborah Tue, 21 May 2013 18:41:33 -0700

Joe and Owen--

Thanks for the ideas!

It's a bit of the opposite goal to LibX, in that rather than having a 
title/DOI/whatever from some random site and wanting to get to  the full-text 
article, I'm looking at the use case of academics who are already viewing the 
full-text article and want a link that they can share with students.  Even 
aside from the proxy prefix, the url in their browser may include (or consist 
entirely of) session gunk.

I'll try a regexp and see how far that gets me. I'm a bit trepidatious about 
the way the DOI standard allows just about any character imaginable, but at 
least there's the 10. prefix. Am also considering that if DOIs also appear in 
the article's bibliography I'll need to make sure the javascript can 
distinguish between them and the DOI for the article itself; but a lot of this 
might be 'cross that bridge if I come to it' stuff.

(As may be jQuery... :-) )

Deborah

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen 
Stephens
Sent: Friday, 17 May 2013 9:01 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DOI scraping

I'd say yes to the investment in jQuery generally - not too difficult to get 
the basics if you already use javascript, and makes some things a lot easier

It sounds like you are trying to do something not dissimilar to LibX 
http://libx.org ? (except via bookmarklet rather than as a browser plugin).
Also looking for custom database scrapers it might be worth looking at Zotero 
translators, as they already exist for many major sources and I guess will be 
grabbing the DOI where it exists if they can 
http://www.zotero.org/support/dev/translators

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 17 May 2013, at 05:32, "Fitchett, Deborah" <deborah.fitch...@lincoln.ac.nz> 
wrote:

> Kia ora koutou,
> 
> I’m wanting to create a bookmarklet that will let people on a journal article 
> webpage just click the bookmarklet and get a permalink to that article, 
> including our proxy information so it can be accessed off-campus.
> 
> Once I’ve got a DOI (or other permalink, but I’ll cross that bridge later), 
> the rest is easy. The trouble is getting the DOI. The options seem to be:
> 
> 1.       Require the user to locate and manually highlight the DOI on the 
> page. This is very easy to code, not so easy for the user who may not even 
> know what a DOI is let alone how to find it; and some interfaces make it hard 
> to accurately select (I’m looking at you, ScienceDirect).
> 
> 2.       Live in hope of universal CoiNS implementation. I might be waiting a 
> long time.
> 
> 3.       Work out, for each database we use, how to scrape the relevant 
> information from the page. Harder/tedious to code, but makes it easy for the 
> user.
> 
> I’ve been looking around for existing code that something like #3. So far 
> I’ve found:
> 
> ·         CiteULike’s bookmarklet (jQuery at http://www.citeulike.org/bm - 
> afaik it’s all rights reserved)
> 
> ·         AltMetrics’ bookmarklet (jQuery at 
> http://altmetric-bookmarklet.dsci.it/assets/content.js - MIT licensed)
> 
> Can anyone think of anything else I should be looking at for inspiration?
> 
> Also on a more general matter: I have the general level of Javascript 
> that one gets by poking at things and doing small projects and then 
> getting distracted by other things and then coming back some months 
> later for a different small project and having to relearn it all over 
> again. I’ve long had jQuery on my “I guess I’m going to have to learn 
> this someday but, um, today I just wanna stick with what I know” list. 
> So is this the kind of thing where it’s going to be quicker to learn 
> something about jQuery before I get started, or can I just as easily 
> muddle along with my existing limited Javascript? (What really are the 
> pros and cons here?)
> 
> Nāku noa, nā
> 
> Deborah Fitchett
> Digital Access Coordinator
> Library, Teaching and Learning
> 
> p +64 3 423 0358
> e 
> deborah.fitch...@lincoln.ac.nz<mailto:deborah.fitch...@lincoln.ac.nz> 
> | w library.lincoln.ac.nz<http://library.lincoln.ac.nz/>
> 
> Lincoln University, Te Whare Wānaka o Aoraki New Zealand's specialist 
> land-based university
> 
> 
> ________________________________
> P Please consider the environment before you print this email.
> "The contents of this e-mail (including any attachments) may be 
> confidential and/or subject to copyright. Any unauthorised use, 
> distribution, or copying of the contents is expressly prohibited.  If you 
> have received this e-mail in error, please advise the sender by return e-mail 
> or telephone and then delete this e-mail together with all attachments from 
> your system."
> 

"The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, 
distribution, or copying of the contents is expressly prohibited.  If you have 
received this e-mail in error, please advise the sender 
by return e-mail or telephone and then delete this e-mail together with all 
attachments from your system."

Re: [CODE4LIB] DOI scraping

Reply via email to