> Re 308576.pdf: the text extraction has a huge loss, but a manual check shows 
> it is identical. However that file has the NPE from PDActionURI.getURI(), 
> could it be that this results in an abort of text extraction?
Same for 569017.pdf.

Likely.  There are two "per file pair contents" files.  The one ending with 
"_ignore_exceptions.xlsx" means that results are not reported if there was an 
exception caught for one of the files (308576.pdf and 569017.pdf aren't in that 
file).  The other one "*_with_exceptions" includes both.  Based on your 
feedback, I should add 2 boolean cols to "*_with_exceptions.xlsx" for 
exceptionInA and exceptionInB?

Reply via email to