Hi,
Am 15.05.2013 14:56, schrieb Maruan Sahyoun:
Hi,
currently PDFBox has a number of workarounds "hidden" in the code for real
world PDFs (e.g. PDFBOX-1172) which are not inline with the spec. There are several
options to deal with that
e.g.
a) keep the workarounds in the core code
IMO we can't drop them. Whenever a parsing issue arises people often
argue that all pdf readers but PDFbox are able to handle the pdf in
question. So people expect that a pdf reader works in any situation
wether the pdf follows the spec or not. That's sad but that's life :-(
b) throw an exception and stop working
We should add some (special) logging, so that one can detect such glitches.
c) handle it through a pluggable extension
I'm not sure if there is one solution for every use case. Sometimes it's just a
question of the used format (e.g. PDFBOX-1172) and sometimes there are bigger
differences.
WDYT?
Maruan Sahyoun
BR
Andreas Lehmkühler