I didn’t know it did that. What is the requirement for this to work? I tried a bunch of pdf files and only one of the 4 files I tried worked.
My script converts the front page of the pdf to html and then basically finds the text with the largest font size (with some other logic) and assumes that is the title. It then uses google scholar to grab the bibliographic information based on the title. As long as the file is not so old that it is a bitmap, it is pretty reliable. Brian > On Jan 6, 2022, at 10:03 AM, Christiaan Hofman <cmhof...@gmail.com> wrote: > > > >> On 5 Dec 2021, at 20:17, Arrigo Benedetti <arrigo.benede...@gmail.com >> <mailto:arrigo.benede...@gmail.com>> wrote: >> >> Brian, >> >> Is yours an AppleScript script? I think I want to do the same thing that >> your script does but since I do not know AppleScript I was planning to >> create a python script that generates the .bib file. Is your script >> available to the community? >> >> Thanks, >> >> -Arrigo >> >> On Sun, Dec 5, 2021 at 11:07 AM Brian Helenbrook <bhele...@clarkson.edu >> <mailto:bhele...@clarkson.edu>> wrote: >> I’m not sure exactly what you are trying to do, but I have a script that >> imports a folder of pdf files then grabs the bibliographic information from >> google scholar for each file >> >>> Le 3 déc. 2021 à 4:53 PM, Arrigo Benedetti <arrigo.benede...@gmail.com >>> <mailto:arrigo.benede...@gmail.com>> a écrit : >>> >>> >>> I want to write a python script that will process a large number of PDF >>> files, extract the relevant information like DOI, etc and create a bibtex >>> archive so I can use BibDesk. I understand that the path to the PDF is >>> stored in the bdsk-file-1 and I was able to decode it with the python code >>> discussed at https://inkdroid.org/2020/09/03/bibdesk-and-zotero/ >>> <https://inkdroid.org/2020/09/03/bibdesk-and-zotero/> >>> The decoded plist has the field: 'relativePath', 'aliasData' where >>> relativePath is obviously the relative path to the PDF file. I'm wondering >>> if I should create the aliasData field and what should I put there. I hope >>> that it's clear what I want to do: to create a bibtex file that BibDesk >>> will be able to read and work on starting from a large number of PDF files. >>> I see this for the most part a one time operation just to avoid the manual >>> creation of thousands of entries with BibDesk. I'm planning to post the >>> code to github when this project is completed. >>> >>> Thanks much, >>> >>> -Arrigo >>> > > I don’t know if you got any further with this. But perhaps you may want to > know that BibDesk already tries to get DOIs and such to generate > bibliographic informations for PDFs added to the database by default, for > instance by dropping them on the main table. This should also happen when you > add files using the AppleScript ‘import’ command. Will that give you the > information that you want? > > Christiaan > > _______________________________________________ > Bibdesk-develop mailing list > Bibdesk-develop@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-develop
_______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop