https://bugs.freedesktop.org/show_bug.cgi?id=66597
--- Comment #5 from Steve White <[email protected]> --- Hi Khaled. Of course we're aware that copying text from PDF is unreliable. In fact, with the currrent technology, based on ToUnicode, it is impossible to reproduce the original text. I am sure however, in the case of Indic scripts, it could be done in such a way that results in mostly readable text. The reason I submitted this report to LibreOffice is that this product does the best job of the several approaches I tested. I think it could be improved with the least effort, and serve as a model for other systems. Regarding the AGLFN, as I said, it could be used it to break a tie, but otherwise, you should reconsider your statements. The AGLFN cannot carry more information than the ToUnicode stream does, and OpenType feature tables carry more information than either can. The best approach would be to judiciously use the OpenType featues to populate the ToUnicode stream. As I said, the AGLFN could be used to break a tie in OpenType feature tables. But if it conflicts with the feature tables, it cannot be right. (And in fact, that's what my tests showed: technologies that relied on AGLFN often showed mistakes because of failure to code a glyph name...which is a pity because correct info was available.) It would be better to drop the technology. Cheers! -- You are receiving this mail because: You are the assignee for the bug.
_______________________________________________ Libreoffice-bugs mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
