It gets the text at the beginning and looks for certain identifiers (like DOI or PMID) using regexes.If it finds one, it fetches bibliographic info from the appropriate source.
Christiaan > On 6 Jan 2022, at 16:36, Brian Helenbrook <bhele...@clarkson.edu> wrote: > > I didn’t know it did that. What is the requirement for this to work? I > tried a bunch of pdf files and only one of the 4 files I tried worked. > > My script converts the front page of the pdf to html and then basically finds > the text with the largest font size (with some other logic) and assumes that > is the title. It then uses google scholar to grab the bibliographic > information based on the title. As long as the file is not so old that it is > a bitmap, it is pretty reliable. > > Brian > > > >> On Jan 6, 2022, at 10:03 AM, Christiaan Hofman <cmhof...@gmail.com >> <mailto:cmhof...@gmail.com>> wrote: >> >> >> >>> On 5 Dec 2021, at 20:17, Arrigo Benedetti <arrigo.benede...@gmail.com >>> <mailto:arrigo.benede...@gmail.com>> wrote: >>> >>> Brian, >>> >>> Is yours an AppleScript script? I think I want to do the same thing that >>> your script does but since I do not know AppleScript I was planning to >>> create a python script that generates the .bib file. Is your script >>> available to the community? >>> >>> Thanks, >>> >>> -Arrigo >>> >>> On Sun, Dec 5, 2021 at 11:07 AM Brian Helenbrook <bhele...@clarkson.edu >>> <mailto:bhele...@clarkson.edu>> wrote: >>> I’m not sure exactly what you are trying to do, but I have a script that >>> imports a folder of pdf files then grabs the bibliographic information from >>> google scholar for each file >>> >>>> Le 3 déc. 2021 à 4:53 PM, Arrigo Benedetti <arrigo.benede...@gmail.com >>>> <mailto:arrigo.benede...@gmail.com>> a écrit : >>>> >>>> >>>> I want to write a python script that will process a large number of PDF >>>> files, extract the relevant information like DOI, etc and create a bibtex >>>> archive so I can use BibDesk. I understand that the path to the PDF is >>>> stored in the bdsk-file-1 and I was able to decode it with the python code >>>> discussed at https://inkdroid.org/2020/09/03/bibdesk-and-zotero/ >>>> <https://inkdroid.org/2020/09/03/bibdesk-and-zotero/> >>>> The decoded plist has the field: 'relativePath', 'aliasData' where >>>> relativePath is obviously the relative path to the PDF file. I'm wondering >>>> if I should create the aliasData field and what should I put there. I hope >>>> that it's clear what I want to do: to create a bibtex file that BibDesk >>>> will be able to read and work on starting from a large number of PDF >>>> files. I see this for the most part a one time operation just to avoid the >>>> manual creation of thousands of entries with BibDesk. I'm planning to post >>>> the code to github when this project is completed. >>>> >>>> Thanks much, >>>> >>>> -Arrigo >>>> >> >> I don’t know if you got any further with this. But perhaps you may want to >> know that BibDesk already tries to get DOIs and such to generate >> bibliographic informations for PDFs added to the database by default, for >> instance by dropping them on the main table. This should also happen when >> you add files using the AppleScript ‘import’ command. Will that give you the >> information that you want? >> >> Christiaan
_______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop