On 2015-05-07 05:07PM, Jochen Sprickerhof wrote:
> * Jason Woofenden <[email protected]> [2015-05-07 10:09]:
> >     pdftohtml -stdout foo.pdf | sed -ne 's/href="\([^"]\+\)"/\n\1\n/g' -e 
> > 's/\(^[^\n]*\n\|\(\n\)\)\([^\n]*\)\n[^\n]*/\2\3/gp'
> 
> I would use grep ;). Using my urlselct from [1] I would write:
> 
> pdftotext foo.pdf - | urlselect
> 
> Cheers Jochen
> 
> [1] http://lists.suckless.org/dev/1504/26641.html

Ooh, grep -o is great! So I guess my sed trick is only needed if
you want to print less than the whole match, or need to do some
sort of transformation (eg lowercasing part of it.)

-- 
Jason

Reply via email to