https://bugs.documentfoundation.org/show_bug.cgi?id=165396
--- Comment #4 from Eyal Rozenberg <[email protected]> --- (In reply to V Stuart Foote from comment #3) > The dimensions of text object > streams do not exist within the PDF for extraction. Only a starting position > and spacing. > > The BT/ET for text object streams inside the PDF do not receive a statement > of "dimensions of the object with the text in it", rather within the text > object Starting BT and Ending ET there is simply a Tf line - with font a > glyph size, a Td line - with text start position in x and y offset from > bottom left and any transformation, and the Tj line - the character string > or lookups from font dictionary. I am not familiar with most of the specifics of PDFs' internal structure. I know that some objects have "boxes" specified and some don't. But - even if "text object streams" don't have them - their dimensions can be readily determined - just like PDF viewers determine them: by using the font metrics to place glyphs until reaching the end of the stream, or stretch of text, or what-not; that gets you the width, or right edge. > And on LibreOffice import via poppler lib, alignment of the draw text object > is observed, as is general glyph size of the font. But beyond the needed > anchoring, the draw shape is not sized to match what had been held for the > stream within PDF. > > And if the font used in the PDF is not available to os/DE, the poppler -> > cairo rendering to a text draw shape uses fall back font to render the > assembled text span. > > In other words, the resulting LibreOffice draw text shapes can differ > considerably from how they were laid down in PDF because they are rendered > with different font glyphs. Yes, you're describing the problematic behavior, that I believe should be changes. I won't argue with you about marking this as an enhancement, since I did say what I'm suggesting is a different tradeoff of what to preserve. -- You are receiving this mail because: You are the assignee for the bug.
