On 17 Mar 2009, at 10:45 AM, Sven-S. Porst wrote: > It's nice to see arXiv support come to BibDesk. > > Currently I am still using the script I created with my colleague for > the purpose because it gives better results (available at > http://earthlingsoft.net/ssp/arXivToBibDesk.html > - I guess I should add that to the scripts link on the wiki). > > It's no problem for me to keep using the script, but I think I'll just > list the differences here in case anybody wonders about what people > consider useful. > > . our import has year, comments and MRClass fields
BD gets the year if there's a journal listed That typically won't happen for new articles. We could get the year and month from the arxiv number though, as I do in my scripts. However that's arguably wrong, as there is no publication hence no publication date yet in that case. > . our import has an abstract (I think I saw that work on 'new' pages > for the arXiv but not for search results That's because "new" pages have an abstract, while "find" pages don't. See also the other comments. > . our import downloads the PDF, places it into the publications folder > and adds it to the Bib Automatic downloading should definitely not be done (who says you even want to import the item?) BD does add a link to the PDF if it's present, and that's all it should do. You can than use the fileview or a script hook to download. > . our import makes sure there are {} around capital letters in paper > titles. The fact that BibDesk doesn't do that makes me wonder whether > anybody actually used the feature so far. That shouldn't be done by default, it's a responsibility of the user. Some cleaning is the most that should be done automatically. > . our import works for single articles (i.e. on article abstract pages > rather than article lists) > That's also the difference in use. In BD, you browse, and immediately see the linked items, then choose which ones you want to import, if any. While you have to go browse to a specific item first, and then decide that you want to import that item. Actually more work. This is also related to downloading the PDF. In fact, your script is more related to the "import" button than the web scraper. BD could also try to interpret the single pages, but from BDs point of view I think that's less useful, so I did not bother spending time on it. > The script itself is an ugly combination of perl and AppleScript (i.e. > pest meeting cholera), so I'm not sure it will be helpful for any > BibDesk plans. > I don't think so. BD only scrapes what's available, the rest simply typically is not available. Leading related (abstract) pages is not an option, as the arxiv is very aggressive about automatic downloads. The only option is OAI, but that usually gives even worse results. > From what I am seeing the strings scraped from the arXiv site should > also be trimmed of their whitespace before the BibDesk publication is > created from them. Sometimes leading spaces seem to sneak in for > author and journal names at least. I can add the code for that if you > want. > > Best > > Sven > > -- > Sven-S. Porst . http://earthlingsoft.net/ssp . AIM: cv47al > Pass as best inventor! > Thanks, I already added some extra cleaning. Christiaan ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop