https://bugs.documentfoundation.org/show_bug.cgi?id=149457

V Stuart Foote <vstuart.fo...@utsa.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |NEW

--- Comment #10 from V Stuart Foote <vstuart.fo...@utsa.edu> ---
@Khaldoun, thanks for the analysis. 

I did notice the 1st issue. I don't know if that is a font fallback, or just
manifestation of the way the glyphs are being extracted from the PDF--where the
logic for handling the glyph transformations is probably not present.

For the second, best to think of them as partial text runs or snippets. Glyphs
are encoded into the PDF with no sense of source script. We filter import them
(using poppler libs) into LibreOffice as just a run of text, all lexical
context is missing. Normal break iterators are not parsed even if present. 
They end up recorded into the draw canvas as text box objects--disjointed by
which glyphs get strung together.

So, given the coarseness of the filter import, just getting them into the
correct RTL sequence (for bug 104597) is a great improvement.  Assembling them
into lexically useful strings, sentences and paragraphs is work still to be
done, work done for bug 118370 is not doing well with assembling the RTL
textboxes, suspect that needs additional logic to do so.

I'm interested in Khaled's take on things at this juncture.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to