Re: [Bibdesk-develop] arXiv support

Christiaan Hofman Tue, 17 Mar 2009 04:02:52 -0700

On 17 Mar 2009, at 10:45 AM, Sven-S. Porst wrote:

> It's nice to see arXiv support come to BibDesk.
>
> Currently I am still using the script I created with my colleague for
> the purpose because it gives better results (available at 
> http://earthlingsoft.net/ssp/arXivToBibDesk.html
>  - I guess I should add that to the scripts link on the wiki).
>
> It's no problem for me to keep using the script, but I think I'll just
> list the differences here in case anybody wonders about what people
> consider useful.
>
> . our import has year, comments and MRClass fields


BD gets the year if there's a journal listed That typically won't  
happen for new articles. We could get the year and month from the  
arxiv number though, as I do in my scripts. However that's arguably  
wrong, as there is no publication hence no publication date yet in  
that case.

> . our import has an abstract (I think I saw that work on 'new' pages
> for the arXiv but not for search results

That's because "new" pages have an abstract, while "find" pages don't.  
See also the other comments.

> . our import downloads the PDF, places it into the publications folder
> and adds it to the Bib

Automatic downloading should definitely not be done (who says you even  
want to import the item?) BD does add a link to the PDF if it's  
present, and that's all it should do. You can than use the fileview or  
a script hook to download.

> . our import makes sure there are {} around capital letters in paper
> titles. The fact that BibDesk doesn't do that makes me wonder whether
> anybody actually used the feature so far.

That shouldn't be done by default, it's a responsibility of the user.  
Some cleaning is the most that should be done automatically.

> . our import works for single articles (i.e. on article abstract pages
> rather than article lists)
>

That's also the difference in use. In BD, you browse, and immediately  
see the linked items, then choose which ones you want to import, if  
any. While you have to go browse to a specific item first, and then  
decide that you want to import that item. Actually more work. This is  
also related to downloading the PDF. In fact, your script is more  
related to the "import" button than the web scraper.

BD could also try to interpret the single pages, but from BDs point of  
view I think that's less useful, so I did not bother spending time on  
it.

> The script itself is an ugly combination of perl and AppleScript (i.e.
> pest meeting cholera), so I'm not sure it will be helpful for any
> BibDesk plans.
>

I don't think so. BD only scrapes what's available, the rest simply  
typically is not available. Leading related (abstract) pages is not an  
option, as the arxiv is very aggressive about automatic downloads. The  
only option is OAI, but that usually gives even worse results.

> From what I am seeing the strings scraped from the arXiv site should
> also be trimmed of their whitespace before the BibDesk publication is
> created from them. Sometimes leading spaces seem to sneak in for
> author and journal names at least. I can add the code for that if you
> want.
>
> Best
>
>               Sven
>
> -- 
> Sven-S. Porst . http://earthlingsoft.net/ssp . AIM: cv47al
> Pass as best inventor!
>

Thanks, I already added some extra cleaning.

Christiaan


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Re: [Bibdesk-develop] arXiv support

Reply via email to