Re: PDFFont#getEncodingFromFont and incorrect usage of StringTokenizer

Andreas Lehmkuehler Tue, 14 Dec 2010 10:47:24 -0800

Hi,


Am 01.12.2010 22:40, schrieb martijn.list:

The PDF document attached to bug report
https://issues.apache.org/jira/browse/PDFBOX-816 (TaxReturn-1.pdf)
throws a NumberFormatException.

#getEncodingFromFont uses a StringTokenizer to split a line into
separate tokens:

StringTokenizer st = new StringTokenizer(line);

The following line however results in a NumberFormatException because
0/NUL is read as one token.

dup 0/NUL put

The StringTokenizer only accepts the following chars as line delimiters:
" \t\n\r\f".

I think this is not correct because it seems that some delimiter chars
are missing like (, ),<,>, [, ], {, }, /, and %

Hmm, the problem are not the unsupported delimiter chars, it's the missing spacecharacter in the input line.


dup 0/NUL put -> dup 0 /NUL put

Is this a bug?

Yes, it is. I filed an issue on JIRA [1] and fixed the problem by replacing each"/" with " /" to ensure that there will be a delimiter at the right place.


Thanks, for reporting!


BR Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-921

Re: PDFFont#getEncodingFromFont and incorrect usage of StringTokenizer

Reply via email to