[
https://issues.apache.org/jira/browse/TIKA-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2432:
------------------------------
Description:
The RTFParser relies on asserts in numerous places. With a fuzzed/corrupted
file, the user will get an AssertionError rather than a TikaException.
It looks like the idea in several places is to allow for the user to turn off
assert-checking to allow for a lenient parser.
{noformat}
// In document
if (equals("b")) {
// b0
assert param == 0;
if (groupState.bold) {
{noformat}
In other places, though, the assert checks for a showstopper.
{noformat}
private void addOutputByte(int b) throws IOException, SAXException,
TikaException {
assert b >= 0 && b < 256 : "byte value out of range: " + b;
{noformat}
It would be useful to distinguish between these. I propose adding a
"beLenient" parameter (or something) to the RTFParser with default=true, that
would ignore the first case if lenient, but would throw a TikaException if
lenient=false. However, we'd always want to throw a TikaException for the
second case.
was:
The RTFParser relies on asserts in numerous places. With a fuzzed/corrupted
file, the user will get an AssertionError rather than a TikaException.
It looks like the idea in several places is to allow for the user to turn off
assert-checking to allow for a lenient parser.
{noformat}
// In document
if (equals("b")) {
// b0
assert param == 0;
if (groupState.bold) {
{noformat}
In other places, though, the assert checks for a showstopper.
{noformat}
private void addOutputByte(int b) throws IOException, SAXException,
TikaException {
assert b >= 0 && b < 256 : "byte value out of range: " + b;
{noformat}
It would be useful to distinguish between these. I propose adding a
"beLenient" parameter (or something) to the RTFParser with default=true, that
would ignore the first case if lenient, but would through a TikaException if
lenient=false. However, we'd always want to through a TikaException for the
second case.
> Convert RTFParser's asserts to TikaExceptions
> ---------------------------------------------
>
> Key: TIKA-2432
> URL: https://issues.apache.org/jira/browse/TIKA-2432
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 1.16
> Reporter: Tim Allison
> Priority: Minor
>
> The RTFParser relies on asserts in numerous places. With a fuzzed/corrupted
> file, the user will get an AssertionError rather than a TikaException.
> It looks like the idea in several places is to allow for the user to turn off
> assert-checking to allow for a lenient parser.
> {noformat}
> // In document
> if (equals("b")) {
> // b0
> assert param == 0;
> if (groupState.bold) {
> {noformat}
> In other places, though, the assert checks for a showstopper.
> {noformat}
> private void addOutputByte(int b) throws IOException, SAXException,
> TikaException {
> assert b >= 0 && b < 256 : "byte value out of range: " + b;
> {noformat}
> It would be useful to distinguish between these. I propose adding a
> "beLenient" parameter (or something) to the RTFParser with default=true, that
> would ignore the first case if lenient, but would throw a TikaException if
> lenient=false. However, we'd always want to throw a TikaException for the
> second case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)