David #19: you say "it perhaps recognizes column stuff from the display
layout instead of the internal representation."

In PDF, the internal representation *is* just the display layout.
Internally, poppler tries to divide this text into blocks (roughly
paragraphs) which are then grouped into columns based on spacing, and
independently into 'flows' (roughly, sequences of similar blocks in
reading order), based on a bunch of heuristics. This is already tricky,
but is made more complicated by text rotation, and different writing
systems (vertical, right to left, etc). Acrobat and Apple's Preview use
different heuristics, so they group text differently, and make a mess of
things on different documents - but they still make a mess of things.

Just explaining what's going on here; this isn't to say that text
selection can't be improved. I'm slowly putting together a patch based
on the reading order sort described in http://pubs.iupr.org/#2003
-breuel-sdiut , which seems to be fixing some of the problems with the
attachment in #7. However as I said to Andres I have no idea when or if
my patches would be accepted.

-- 
Evince doesn't handle columns properly
https://bugs.launchpad.net/bugs/33288
You received this bug notification because you are a member of Ubuntu
Desktop Bugs, which is a direct subscriber.

-- 
desktop-bugs mailing list
desktop-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/desktop-bugs

Reply via email to