[ https://issues.apache.org/jira/browse/PDFBOX-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437849#comment-17437849 ]
Michael Klink commented on PDFBOX-5297: --------------------------------------- {quote}Please have a look and feel free to comment{quote} It is improved insofar as you take the full first token until a space character. Nonetheless, there are many *DA* forms your code will misinterpret. {quote}I'm not sure how to parse the DA into instructions, find the Font setting one and grab the first argument. Could you elaborate?{quote} You might want to have a look at the PDFBox class {{org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString}}, in particular its {{processAppearanceStringOperators}} method and the other {{process...}} method called from there. This is how PDFBox extracts text font, text size, and text color information from the *DA* string. {quote}Also, could you elaborate on what the set of Font names that should be supported, even if they're not found in the *DR*?{quote} Strictly speaking, i.e. according to the specification, _none at all_. Nonetheless, Adobe Acrobat (and many other PDF form-fillers following Adobe's lead) accept a set of font names even if they are not defined in the *DR*, see [this PDFBox-1234 comment|https://issues.apache.org/jira/browse/PDFBOX-1234?focusedCommentId=14304601#comment-14304601]. > class org.apache.pdfbox.cos.COSName cannot be cast to class > org.apache.pdfbox.cos.COSString > ------------------------------------------------------------------------------------------- > > Key: PDFBOX-5297 > URL: https://issues.apache.org/jira/browse/PDFBOX-5297 > Project: PDFBox > Issue Type: Bug > Components: AcroForm > Affects Versions: 2.0.24 > Reporter: Chris Newhouse > Assignee: Tilman Hausherr > Priority: Major > Fix For: 2.0.25, 3.0.0 PDFBox > > > A customer provided us with a PDF that contains an AcroForm and has some of > the data filled in. There are various ways to trigger the error, but here's a > stacktrace: > {code:java} > class org.apache.pdfbox.cos.COSName cannot be cast to class > org.apache.pdfbox.cos.COSString (org.apache.pdfbox.cos.COSName and > org.apache.pdfbox.cos.COSString are in unnamed module of loader 'app') > at > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:91) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:114) > at > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:331) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566){code} > The PDF contains sensitive user information, so I cannot post it here > publicly, but I'd be willing to submit it to a private upload area. When I > use an editor to remove/change the sensitive data, the problem goes away or > sprouts up as a different error (related to fonts). > > Here is a little bit of metadata I can provide right now: > {code:java} > { > "Author": "SE:W:CAR:MP", > "CreationDate": "D:20211012165530Z00'00'", > "Creator": "Adobe LiveCycle Designer ES 9.0", > "Keywords": "Fillable", > "ModDate": "D:20211012165530Z00'00'", > "Producer": "macOS Version 10.15.7 (Build 19H1417) Quartz PDFContext", > "Subject": "Request for Taxpayer Identification Number and Certification", > "Title": "Form W-9 (Rev. October 2018)" > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org