Tim Allison created TIKA-2432:
---------------------------------
Summary: Convert RTFParser's asserts to TikaExceptions
Key: TIKA-2432
URL: https://issues.apache.org/jira/browse/TIKA-2432
Project: Tika
Issue Type: Improvement
Affects Versions: 1.16
Reporter: Tim Allison
Priority: Minor
The RTFParser relies on asserts in numerous places. With a fuzzed/corrupted
file, the user will get an AssertionError rather than a TikaException.
It looks like the idea in several places is to allow for the user to turn off
assert-checking to allow for a lenient parser.
{noformat}
// In document
if (equals("b")) {
// b0
assert param == 0;
if (groupState.bold) {
{noformat}
In other places, though, the assert checks for a showstopper.
{noformat}
private void addOutputByte(int b) throws IOException, SAXException,
TikaException {
assert b >= 0 && b < 256 : "byte value out of range: " + b;
{noformat}
It would be useful to distinguish between these. I propose adding a
"beLenient" parameter (or something) to the RTFParser with default=true, that
would ignore the first case if lenient, but would through a TikaException if
lenient=false. However, we'd always want to through a TikaException for the
second case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)