[Libreoffice-bugs] [Bug 66597] Other: Text-copy problems with Hindi text copied from PDF

bugzilla-daemon Thu, 04 Jul 2013 14:26:43 -0700

https://bugs.freedesktop.org/show_bug.cgi?id=66597


--- Comment #5 from Steve White <[email protected]> ---
Hi Khaled.

Of course we're aware that copying text from PDF is unreliable.
In fact, with the currrent technology, based on ToUnicode, it is impossible to
reproduce the original text.

I am sure however, in the case of Indic scripts, it could be done in such a way
that results in mostly readable text.

The reason I submitted this report to LibreOffice is that this product does the
best job of the several approaches I tested.  I think it could be improved with
the least effort, and serve as a model for other systems.

Regarding the AGLFN, as I said, it could be used it to break a tie, but
otherwise, you should reconsider your statements.  The AGLFN cannot carry more
information than the ToUnicode stream does, and OpenType feature tables carry
more information than either can.  The best approach would be to judiciously
use the OpenType featues to populate the ToUnicode stream.

As I said, the AGLFN could be used to break a tie in OpenType feature tables. 
But if it conflicts with the feature tables, it cannot be right.  (And in fact,
that's what my tests showed: technologies that relied on AGLFN often showed
mistakes because of failure to code a glyph name...which is a pity because
correct info was available.) It would be better to drop the technology.

Cheers!

-- 
You are receiving this mail because:
You are the assignee for the bug.

_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 66597] Other: Text-copy problems with Hindi text copied from PDF

Reply via email to