PDFFont#getEncodingFromFont and incorrect usage of StringTokenizer

martijn.list Wed, 01 Dec 2010 13:41:05 -0800

The PDF document attached to bug report
https://issues.apache.org/jira/browse/PDFBOX-816 (TaxReturn-1.pdf)
throws a NumberFormatException.


#getEncodingFromFont uses a StringTokenizer to split a line into
separate tokens:

StringTokenizer st = new StringTokenizer(line);

The following line however results in a NumberFormatException because
0/NUL is read as one token.

dup 0/NUL put

The StringTokenizer only accepts the following chars as line delimiters:
" \t\n\r\f".

I think this is not correct because it seems that some delimiter chars
are missing like (, ), <, >, [, ], {, }, /, and %

Is this a bug?

Kind regards,

Martijn Brinkers

PDFFont#getEncodingFromFont and incorrect usage of StringTokenizer

Reply via email to