https://bugs.documentfoundation.org/show_bug.cgi?id=158329
--- Comment #14 from خالد حسني <[email protected]> --- (In reply to David Huggins-Daines from comment #13) > (In reply to خالد حسني from comment #12) > > On top of that, ToUnicode mapping must be unique, a glyph can appear there > > only once, but fonts might map different characters to the same glyph, and > > in this case ToUnicode to be used for one of these mappings, and all the > > others will need ActualText. > > Thank you for the really detailed explanation! In this particular > regression we have a sort of ligature, so ToUnicode should work, but I > understand why it isn't sufficient in the more general case. > > I'll try to do a best-effort implementation of ActualText for > pdfminer/pdfplumber, since as you say it gets used for the smallest span of > text necessary, and since text extraction is best-effort by definition > anyway. > > I haven't checked to see if poppler, qpdf, pdfium, and company are working > on ActualText support... Poppler supports ActualText, pdfium does not (at least last I checked), I don’t know about qpdf. -- You are receiving this mail because: You are the assignee for the bug.
