[ 
https://issues.apache.org/jira/browse/TIKA-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2432:
------------------------------
    Description: 
The RTFParser relies on asserts in numerous places.  With a fuzzed/corrupted 
file, the user will get an AssertionError rather than a TikaException.

It looks like the idea in several places is to allow for the user to turn off 
assert-checking to allow for a lenient parser.  

{noformat}
            // In document
            if (equals("b")) {
                // b0
                assert param == 0;
                if (groupState.bold) {
{noformat}

In other places, though, the assert checks for a showstopper.

{noformat}
    private void addOutputByte(int b) throws IOException, SAXException, 
TikaException {
        assert b >= 0 && b < 256 : "byte value out of range: " + b;
{noformat} 

It would be useful to distinguish between these.  I propose adding a 
"beLenient" parameter (or something) to the RTFParser with default=true, that 
would ignore the first case if lenient, but would throw a TikaException if 
lenient=false.  However, we'd always want to throw a TikaException for the 
second case.

  was:
The RTFParser relies on asserts in numerous places.  With a fuzzed/corrupted 
file, the user will get an AssertionError rather than a TikaException.

It looks like the idea in several places is to allow for the user to turn off 
assert-checking to allow for a lenient parser.  

{noformat}
            // In document
            if (equals("b")) {
                // b0
                assert param == 0;
                if (groupState.bold) {
{noformat}

In other places, though, the assert checks for a showstopper.

{noformat}
    private void addOutputByte(int b) throws IOException, SAXException, 
TikaException {
        assert b >= 0 && b < 256 : "byte value out of range: " + b;
{noformat} 

It would be useful to distinguish between these.  I propose adding a 
"beLenient" parameter (or something) to the RTFParser with default=true, that 
would ignore the first case if lenient, but would through a TikaException if 
lenient=false.  However, we'd always want to through a TikaException for the 
second case.


> Convert RTFParser's asserts to TikaExceptions
> ---------------------------------------------
>
>                 Key: TIKA-2432
>                 URL: https://issues.apache.org/jira/browse/TIKA-2432
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 1.16
>            Reporter: Tim Allison
>            Priority: Minor
>
> The RTFParser relies on asserts in numerous places.  With a fuzzed/corrupted 
> file, the user will get an AssertionError rather than a TikaException.
> It looks like the idea in several places is to allow for the user to turn off 
> assert-checking to allow for a lenient parser.  
> {noformat}
>             // In document
>             if (equals("b")) {
>                 // b0
>                 assert param == 0;
>                 if (groupState.bold) {
> {noformat}
> In other places, though, the assert checks for a showstopper.
> {noformat}
>     private void addOutputByte(int b) throws IOException, SAXException, 
> TikaException {
>         assert b >= 0 && b < 256 : "byte value out of range: " + b;
> {noformat} 
> It would be useful to distinguish between these.  I propose adding a 
> "beLenient" parameter (or something) to the RTFParser with default=true, that 
> would ignore the first case if lenient, but would throw a TikaException if 
> lenient=false.  However, we'd always want to throw a TikaException for the 
> second case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to