[
https://issues.apache.org/jira/browse/PDFBOX-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727973#comment-14727973
]
John Hewson commented on PDFBOX-2951:
-------------------------------------
{quote}
no I had not realized this but it's interesting and I'll see whether I
can make use of it; mainly need to check what the "default behavior"
would be; in my PDF Analyzer I try to keep a current text and graphics
context, matrix and whatever comprises the PDF document "status" and
then associated this to the "content" operators like text or image content.
{quote}
You're literally rebuilding PDFStreamEngine from scratch. There's no default
behaviour, we add our own behaviour by subclassing in PageDrawer for rendering
and in PDFTextStripper for text extraction. But all that PDFStreamEngine does
on its own is parse the operators, keep track of the graphics state, and
provide methods which can be overridden to hook into all of that. For example,
the current font is already available via
getGraphicsState().getTextState().getFont(). I'd urge you to look at our
[CustomGraphicsStreamEngine.java
|https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/rendering/CustomGraphicsStreamEngine.java]
example.
{quote}
As far as the actual bug report is concerned, did I get you right that
the error should go away, if rather than using getString I use getBytes
plus a current stream encoding (will check where I get this from)?
{quote}
Yes but don't do that, override the appropriate methods in PDFStreamEngine
instead, e.g. beginText, endText, showTextString, showTextStrings,
applyTextAdjustment, showText, showGlyph, etc.
> quotedbl causes NullPointerException
> ------------------------------------
>
> Key: PDFBOX-2951
> URL: https://issues.apache.org/jira/browse/PDFBOX-2951
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.0
> Environment: Windows 10 64 bit
> Reporter: Juergen Uhl
> Attachments: Test.jar, Test.java, Test.pdf
>
>
> I have a pdf document using (besides others) the font CourierNewPS-BoldMT and
> text with this font containing a double quote.
> When calling PDFont.encode, this results in a NullPointerException due to the
> following:
> The font encoding is built using pdf /DIFFERENCES which overwrites the
> original "quotedbl" at index 34 with an "A". The entries for
> quotedblbase/left/right are left unchanged. As a result, the inverted font
> does not contain "quotedbl" as key.
> Within encode, the character code 34 gets assigned the name "quotedbl", which
> is then not found in the inverse encoding (PDTrueTypeFont.encode -> int code
> = inverted.get(name))
> Right before this code line causing the NullPointerException, there is a
> check whether ttf.hasGlyph("quotedbl") (which in this case is false) and, if
> not, whether ttf.hasGlyph("uni0022") (which in this case is true); however,
> this has no consequence for the continuation of the code, which then crashes,
> since inverted.get("quotedbl") is null (which is assigned to an int).
> I believe, this is a bug in PDFBox, but have no idea, whether the handling
> within encode should be changed (maybe using the "else" part in case
> ttf.hasGlyph("quotedbl") is false or whether code 34 should be assigned to
> quotedblbase in the first place, or even something else.
> I attached the file (Test.pdf) where the error occurs and a jar (main is
> com.juergisApps.pdfConverter.Test) that reproduces the problem.
> You may also see
> http://stackoverflow.com/questions/7140476/pdf-font-mapping-error
> Juergen
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]