Hi Hypsurus, I hope you're having fun coding. Don't let me detract from that. But if you just need to extract links from pdfs, you can do so with existing tools, eg:
pdftohtml -stdout foo.pdf | sed -ne 's/\(^\|\n\)\n\([^\n]*\)\n[^\n]*/\1\2/gp; t; s/href="\([^"]\+\)"/\n\n\1\n/g; D' Sorry if that sed thing is more complex than it needs to be. I'm just learning the other sed commands besides s///. The extra complexity with the "\n"s is to handle multiple links on the same line. -- Jason
