It gets the text at the beginning and looks for certain identifiers (like DOI 
or PMID) using regexes.If it finds one, it fetches bibliographic info from the 
appropriate source.

Christiaan

> On 6 Jan 2022, at 16:36, Brian Helenbrook <bhele...@clarkson.edu> wrote:
> 
> I didn’t know it did that.  What is the requirement for this to work?  I 
> tried a bunch of pdf files and only one of the 4 files I tried worked.  
> 
> My script converts the front page of the pdf to html and then basically finds 
> the text with the largest font size (with some other logic) and assumes that 
> is the title.  It then uses google scholar to grab the bibliographic 
> information based on the title.  As long as the file is not so old that it is 
> a bitmap, it is pretty reliable. 
> 
> Brian
> 
> 
> 
>> On Jan 6, 2022, at 10:03 AM, Christiaan Hofman <cmhof...@gmail.com 
>> <mailto:cmhof...@gmail.com>> wrote:
>> 
>> 
>> 
>>> On 5 Dec 2021, at 20:17, Arrigo Benedetti <arrigo.benede...@gmail.com 
>>> <mailto:arrigo.benede...@gmail.com>> wrote:
>>> 
>>> Brian,
>>> 
>>> Is yours an AppleScript script? I think I want to do the same thing that 
>>> your script does but since I do not know AppleScript I was planning to 
>>> create a python script that generates the .bib file. Is your script 
>>> available to the community?
>>> 
>>> Thanks,
>>> 
>>> -Arrigo
>>> 
>>> On Sun, Dec 5, 2021 at 11:07 AM Brian Helenbrook <bhele...@clarkson.edu 
>>> <mailto:bhele...@clarkson.edu>> wrote:
>>> I’m not sure exactly what you are trying to do, but I have a script that 
>>> imports a folder of pdf files then grabs the bibliographic information from 
>>> google scholar for each file 
>>> 
>>>> Le 3 déc. 2021 à 4:53 PM, Arrigo Benedetti <arrigo.benede...@gmail.com 
>>>> <mailto:arrigo.benede...@gmail.com>> a écrit :
>>>> 
>>>> 
>>>> I want to write a python script that will process a large number of PDF 
>>>> files, extract the relevant information like DOI, etc and create a bibtex 
>>>> archive so I can use BibDesk. I understand that the path to the PDF is 
>>>> stored in the bdsk-file-1 and I was able to decode it with the python code 
>>>> discussed at https://inkdroid.org/2020/09/03/bibdesk-and-zotero/ 
>>>> <https://inkdroid.org/2020/09/03/bibdesk-and-zotero/>
>>>> The decoded plist has the field: 'relativePath', 'aliasData' where 
>>>> relativePath is obviously the relative path to the PDF file. I'm wondering 
>>>> if I should create the aliasData field and what should I put there. I hope 
>>>> that it's clear what I want to do: to create a bibtex file that BibDesk 
>>>> will be able to read and work on starting from a large number of PDF 
>>>> files. I see this for the most part a one time operation just to avoid the 
>>>> manual creation of thousands of entries with BibDesk. I'm planning to post 
>>>> the code to github when this project is completed.
>>>> 
>>>> Thanks much,
>>>> 
>>>> -Arrigo
>>>>  
>> 
>> I don’t know if you got any further with this. But perhaps you may want to 
>> know that BibDesk already tries to get DOIs and such to generate 
>> bibliographic informations for PDFs added to the database by default, for 
>> instance by dropping them on the main table. This should also happen when 
>> you add files using the AppleScript ‘import’ command. Will that give you the 
>> information that you want?
>> 
>> Christiaan

_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Reply via email to