[
https://issues.apache.org/jira/browse/PDFBOX-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176044#comment-15176044
]
Christian Brandt commented on PDFBOX-3255:
------------------------------------------
Hi!
I ended up having the following routine:
{code}
private void setStringValue(PDField field, String input) throws Exception
{
/* Extract font name */
String da = field.getCOSObject().getString(COSName.DA.getName());
Matcher m = Pattern.compile("/?(.*) [\\d]+ Tf.*",
Pattern.CASE_INSENSITIVE).matcher(da);
String name = m.find() ? m.group(1) : null;
PDFont font =
field.getAcroForm().getDefaultResources().getFont(COSName.getPDFName(name));
if (font instanceof PDSimpleFont)
{
/* Walk through used characters and replace ones with space
that can not be represented by the font */
StringBuilder value = new StringBuilder();
Encoding encoding = ((PDSimpleFont) font).getEncoding();
for (int i=0;i<input.length();i++)
{
char c = input.charAt(i);
if (".notdef".equals(encoding.getName(c)) == false)
value.append(c);
else
value.append(' ');
}
field.setValue(value.toString());
}
else
field.setValue(input);
}
{code}
Despite the obvious performance issues, this seems to work at least with the
test cases I tried. However,
1. It would be nice to use
PDVariableText.getDefaultAppearanceString().getFont() to get the associated
font instead of parsing the name manually and then fetching it from the
resources, but the method is not accessible. Now I am just not sure if my regex
covers all the possible cases.
2. Because the Encoding.contains('\u00AD') may return true (value ".notdef"
seems to be stored), a string comparison is required which is not nice. This
can be of course optimized a bit by the caller with lookup for recurring
characters, but it would make life easier if we could get rid of the whole
string comparison.
> Reasonable way to handle missing characters in font
> ---------------------------------------------------
>
> Key: PDFBOX-3255
> URL: https://issues.apache.org/jira/browse/PDFBOX-3255
> Project: PDFBox
> Issue Type: Wish
> Components: AcroForm
> Affects Versions: 2.0.0
> Reporter: Christian Brandt
> Labels: newbie
> Attachments: TEST.pdf
>
>
> Hello,
> We have an issue with setting form field values if the input contains
> characters that cannot be rendered with the associated font. The system
> throws similar exception to:
> java.lang.IllegalArgumentException: U+0308 ('dieresiscmb') is not available
> in this font's encoding: MacRomanEncoding with differences
> Currently this is problematic to be handled outside the framework because
> based on my understanding (please correct me if I'm wrong) the caller does
> not have a way to figure out what font will be eventually used and therefore
> which characters are not renderable.
> What we would ultimately like, is that the library would optionally replace
> unrenderable characters with some another existing character (e.g. space)
> instead of failing the call, or that the library would provide a way to
> recover from this error so that the user would be able to call the method
> again with altered input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]