On Mon, Apr 04, 2016 at 09:56:55PM +0100, Ken Moffat wrote:
> On Mon, Apr 04, 2016 at 01:24:47PM -0700, Paul Rogers wrote:
(replying to me)
> > > Now that I need to again copy from PDFs, on 7.9, I do not seem to
> > > have any PDF viewer which will let me copy. Has anybody got a
> > > current version of a PDF viewer that lets them do this ?
> > >
> > > So far, epdfview, evince, mupdf, okular (kf5) do not seem to let me
> > > do this.
> >
> > Ken, I have been able to cut/paste text in epdfview. But it only does
> > rectangular blocks, not line oriented. So I have to go
> > margin-to-margin.
>
> Paul, thanks, but I can't seem to do that.
>
[snip]
>
> Actually, I might just give up - too many other things to do, and if
> I _can_ get the full text I will then need to see if I can identify
> the correct lines for Article 1.
>
Coming back to this: tldr; - "not always possible".
So, I got to a point where I was fed up with not having text (UDHR,
Article 1) in some indic and S.E. Asian scripts. But I was starting
to make sense of some of the indic numbers, to help me identify
which part of the text was Article '1'. When I did this before
("font analysis" i.e. what it can cover, and how it looks) it was on
the same machine, but almost 3 years ago (judging from the odt files
I created in libreoffice).
Unfortunately, the system I used to do that is long gone - probably
because I had to resize my partitions to include texlive, and at
that time it was already old. Oh well - I'm now using xelatex for
the PDFs of which languages a font covers, because libreoffice uses
fontconfig to pull in whatever covers a missing glyph, so I suppose
I get a net benefit ;-p
There was a report that libreoffice itself can open PDFs (I did not
know that - they are the last item on the list of file types), but
it just gave me a few random ASCII characters (from a Kayah Li - or
Karen - PDF).
It looks as if I was using either evince-3.6 or evince-3.4 in the
past. Google found a debian bug report suggesting that -3.8 was
when the evince interface got changed. So, I tried building
evince-3.6.1 (I have the minimal deps for whichever version was in
BLFS-7.9, I guess that is 3.18) and putting it in /opt/oldevince.
The good news: it let me paste, as in the past. The bad: I just got
the same ASCII characters. Then I tried 3.4.0 (that box used to
only have stable LFS releases, possibly it was running that at the
time). Same result. After that I went to another machine and was
lucky enough to find an unmaintained LFS-7.2 system (unmaintained
because of the glibc vulnerability last year, and I had not built a
new system on it since then). That too had 3.4.0 and gave the same
result.
Google found some recommendations to use abiword, and on my LFS-7.2
system I had that - it opened, showing some document header
information (it was produced by an Adobe program, and showed the
user who created it) with similar ASCII garbage for the text.
A bit more googling found reports that whether or not you can paste
the text from an A`obe PDF depends on whether or not it is locked
<sigh/>. Also reports (re abiword, above) that libreoffice has
problems pasting indic and S.E. Asian scripts, inherited from OOo.
So, although I am fairly sure that I managed to paste some non-latin
text from a PDF opened in evince in the past, I am no longer certain
and more importantly, it seems that some PDFs cannot be pasted from
using libre PDF readers. At this point, I am giving up and
recording why.
Now, I'll return you to your normal program, and I'll go back to
trying to make sense of the many Noto fonts - whoever created them
obviously thought it was important to cover *everything* in unicode,
even if the potential audience is in the hundreds (e.g. Cuneiform).
ĸen
--
This email was written using 100% recycled letters.
--
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page