On Tue 20 Sep 2016 at 11:01:09 -0500, David Wright wrote:
> Well, I did write in
> that "This is one area where a bit of experimentation will help much
> more than trying to understand the scattered documentation."
I'm unsure whether the issue is copy/paste with a mouse or the nature of
the PDF/PS file. Scattered documentation for both cases doesn't help.
> On Tue 20 Sep 2016 at 15:08:58 (+0100), Brian wrote:
> > On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:
> > > My own experience is all or nothing. What I get correlates with the
> > > output of pdftotext; if that can extract the text, I can copy it
> > > with the mouse, if not then I can't. PDFs I produce with paps, for
> > > example, don't work: I don't know why this is the case.
> > How do you produce a PDF using paps?
> Sorry, missed out a step. The paps output is filtered through ps2pdf
> so that could explain a lot. Thanks for reminding me. (The clue is in
> the name!)
I thought you had but wanted to check. Does paps produce a searchable PS
file? My quick tests with evince and okular indicate it doesn't, If not,
ps2pdf isn't likely to produce a PDF with extractable text.
> > https://github.com/angea/PDF101/tree/master/handcoded/textextract
> > is of interest.
> Useful reference, thanks.
There is much more to it than that. But it can saved for another day.
> > > My experience here is similar to xpdf but with a few differences: when
> > > it works (the same files do), the selection is line by line (ie like
> > > an xterm) rather than a strict rectangle; if it can't do it, it
> > > doesn't highlight (whereas xpdf "lies": it highlights but fails to
> > > copy); the highlighting may be coloured (white→blue, black→white) or
> > > black (which hides the text).
> > Evince seems to be aware if *all* the text is not copiable and will then
> > not allow it to be selected. It does not appear to be aware when only
> > portions of a document are not copiable/searchable and these portions
> > are selectable.
> Well, man xpdf says baldly "Dragging the mouse with the left
> button held down will highlight an arbitrary rectangle." I guess I
> hadn't realised just how bald that rectangle can be.
> It's tedious ascertaining anything about xpdf in the "jessie period"
> because so much of it is broken; I have to repeat everything in
> wheezy to make sure the problem is ephemeral. (Will these problems
> go away?)
xpdf had some loving care in the past; it could do with more. I like the
program but these days tend to use mupdf.