[ 
https://issues.apache.org/jira/browse/PDFBOX-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437849#comment-17437849
 ] 

Michael Klink commented on PDFBOX-5297:
---------------------------------------

{quote}Please have a look and feel free to comment{quote}

It is improved insofar as you take the full first token until a space 
character. Nonetheless, there are many *DA* forms your code will misinterpret.

{quote}I'm not sure how to parse the DA into instructions, find the Font 
setting one and grab the first argument. Could you elaborate?{quote}

You might want to have a look at the PDFBox class 
{{org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString}}, in 
particular its {{processAppearanceStringOperators}} method and the other 
{{process...}} method called from there. This is how PDFBox extracts text font, 
text size, and text color information from the *DA* string.

{quote}Also, could you elaborate on what the set of Font names that should be 
supported, even if they're not found in the *DR*?{quote}

Strictly speaking, i.e. according to the specification, _none at all_. 
Nonetheless, Adobe Acrobat (and many other PDF form-fillers following Adobe's 
lead) accept a set of font names even if they are not defined in the *DR*, see 
[this PDFBox-1234 
comment|https://issues.apache.org/jira/browse/PDFBOX-1234?focusedCommentId=14304601#comment-14304601].



> class org.apache.pdfbox.cos.COSName cannot be cast to class 
> org.apache.pdfbox.cos.COSString
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5297
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5297
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.24
>            Reporter: Chris Newhouse
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.25, 3.0.0 PDFBox
>
>
> A customer provided us with a PDF that contains an AcroForm and has some of 
> the data filled in. There are various ways to trigger the error, but here's a 
> stacktrace:
> {code:java}
> class org.apache.pdfbox.cos.COSName cannot be cast to class 
> org.apache.pdfbox.cos.COSString (org.apache.pdfbox.cos.COSName and 
> org.apache.pdfbox.cos.COSString are in unnamed module of loader 'app')
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:91)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:114)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:331)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566){code}
> The PDF contains sensitive user information, so I cannot post it here 
> publicly, but I'd be willing to submit it to a private upload area. When I 
> use an editor to remove/change the sensitive data, the problem goes away or 
> sprouts up as a different error (related to fonts).
>  
> Here is a little bit of metadata I can provide right now:
> {code:java}
> {
>  "Author": "SE:W:CAR:MP",
>  "CreationDate": "D:20211012165530Z00'00'",
>  "Creator": "Adobe LiveCycle Designer ES 9.0",
>  "Keywords": "Fillable",
>  "ModDate": "D:20211012165530Z00'00'",
>  "Producer": "macOS Version 10.15.7 (Build 19H1417) Quartz PDFContext",
>  "Subject": "Request for Taxpayer Identification Number and Certification",
>  "Title": "Form W-9 (Rev. October 2018)"
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to