Hi! May I suggest that Poppler could be considered to help with this project, even though it seems to be written in C++? It seems to me like it's a library that already does most of the work, so the intern wouldn't have to reinvent the wheel, as we say.
I wasn't sure if I should add it to the wiki page and where should I add it. It's wikipedia page gives us some overall information: https://en.wikipedia.org/wiki/Poppler_(software) And here is the project website: https://poppler.freedesktop.org/ Notice that this project is GPL and it even has a Debian package. Best regards, Renata