https://bugs.documentfoundation.org/show_bug.cgi?id=117428
--- Comment #28 from Jonathan Clark <[email protected]> --- To update this bug, I briefly investigated the current state of text extraction. I performed the following tests using a trivial Devanagari Writer document containing only "नित्यानन्दकरी", then exported to PDF using our filter: Adobe Acrobat Reader now extracts the correct text. This is an improvement over the original report. Evince also extracts the correct text. The macOS preview app crashed when I tried to click on the text to select it, but using the keyboard I was able to copy and paste the correct text. Current stable Firefox (pdf.js) and Google Chrome do not seem to handle ActualText at all. Both programs seem to replace glyphs without ToUnicode mappings with an index, whether or not ActualText is specified. I also tested with quick-and-dirty hacks to simulate ActualText per word, forcing ActualText for every cluster, and using ActualText with no ToUnicode mappings; none of these fixes improved the situation. As noted above, ActualText per-word could have other benefits. Currently, however, I don't think it would improve the text extraction situation. The major blocker seems to be the readers that don't implement any ActualText support at all, whether it's done per-word or per-cluster. -- You are receiving this mail because: You are the assignee for the bug.
