Hi Hypsurus,

I hope you're having fun coding. Don't let me detract from that.
But if you just need to extract links from pdfs, you can do so with
existing tools, eg:

pdftohtml -stdout foo.pdf | sed -ne 's/\(^\|\n\)\n\([^\n]*\)\n[^\n]*/\1\2/gp; 
t; s/href="\([^"]\+\)"/\n\n\1\n/g; D'

Sorry if that sed thing is more complex than it needs to be. I'm
just learning the other sed commands besides s///.

The extra complexity with the "\n"s is to handle multiple links on
the same line.

-- 
Jason

Reply via email to