Re: [blfs-support] Copying from PDFs ?

Ken Moffat Tue, 12 Apr 2016 20:55:48 -0700

On Mon, Apr 04, 2016 at 09:56:55PM +0100, Ken Moffat wrote:
> On Mon, Apr 04, 2016 at 01:24:47PM -0700, Paul Rogers wrote:
(replying to me)
> > > Now that I need to again copy from PDFs, on 7.9, I do not seem to
> > > have any PDF viewer which will let me copy.  Has anybody got a
> > > current version of a PDF viewer that lets them do this ?
> > > 
> > > So far, epdfview, evince, mupdf, okular (kf5) do not seem to let me
> > > do this.
> > 
> > Ken, I have been able to cut/paste text in epdfview.  But it only does
> > rectangular blocks, not line oriented.  So I have to go
> > margin-to-margin.
> 
> Paul, thanks, but I can't seem to do that.
> 
[snip]
> 
> Actually, I might just give up - too many other things to do, and if
> I _can_ get the full text I will then need to see if I can identify
> the correct lines for Article 1.
> 
Coming back to this: tldr; -  "not always possible".


So, I got to a point where I was fed up with not having text (UDHR,
Article 1) in some indic and S.E. Asian scripts.  But I was starting
to make sense of some of the indic numbers, to help me identify
which part of the text was Article '1'.  When I did this before
("font analysis" i.e. what it can cover, and how it looks) it was on
the same machine, but almost 3 years ago (judging from the odt files
I created in libreoffice).

Unfortunately, the system I used to do that is long gone - probably
because I had to resize my partitions to include texlive, and at
that time it was already old.  Oh well - I'm now using xelatex for
the PDFs of which languages a font covers, because libreoffice uses
fontconfig to pull in whatever covers a missing glyph, so I suppose
I get a net benefit ;-p

There was a report that libreoffice itself can open PDFs (I did not
know that - they are the last item on the list of file types), but
it just gave me a few random ASCII characters (from a Kayah Li - or
Karen - PDF).

It looks as if I was using either evince-3.6 or evince-3.4 in the
past.  Google found a debian bug report suggesting that -3.8 was
when the evince interface got changed.  So, I tried building
evince-3.6.1 (I have the minimal deps for whichever version was in
BLFS-7.9, I guess that is 3.18) and putting it in /opt/oldevince.

The good news: it let me paste, as in the past.  The bad: I just got
the same ASCII characters.  Then I tried 3.4.0 (that box used to
only have stable LFS releases, possibly it was running that at the
time).  Same result.  After that I went to another machine and was
lucky enough to find an unmaintained LFS-7.2 system (unmaintained
because of the glibc vulnerability last year, and I had not built a
new system on it since then).  That too had 3.4.0 and gave the same
result.

Google found some recommendations to use abiword, and on my LFS-7.2
system I had that - it opened, showing some document header
information (it was produced by an Adobe program, and showed the
user who created it) with similar ASCII garbage for the text.

A bit more googling found reports that whether or not you can paste
the text from an A`obe PDF depends on whether or not it is locked
<sigh/>.  Also reports (re abiword, above) that libreoffice has
problems pasting indic and S.E. Asian scripts, inherited from OOo.

So, although I am fairly sure that I managed to paste some non-latin
text from a PDF opened in evince in the past, I am no longer certain
and more importantly, it seems that some PDFs cannot be pasted from
using libre PDF readers.  At this point, I am giving up and
recording why.

Now, I'll return you to your normal program, and I'll go back to
trying to make sense of the many Noto fonts - whoever created them
obviously thought it was important to cover *everything* in unicode,
even if the potential audience is in the hundreds (e.g. Cuneiform).

ĸen
-- 
This email was written using 100% recycled letters.
-- 
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Re: [blfs-support] Copying from PDFs ?

Reply via email to