[ 
https://issues.apache.org/jira/browse/TIKA-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087273#comment-16087273
 ] 

Tim Allison commented on TIKA-2430:
-----------------------------------

[~lfcnassif], there are two options now: randomly truncate a file, and randomly 
choose bytes to overwrite with random bytes.  If there's a more common pattern 
you see...randomly write a block length chunk in a file, please re-open this 
issue.  

This has already revealed two areas for improvement in POI with just one test 
file.  I wasn't able to reproduce the EMF bug on the one test file I used, 
yet...

> Add at least dev test capability to run Tika against corrupted files in our 
> test suite
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-2430
>                 URL: https://issues.apache.org/jira/browse/TIKA-2430
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.17
>
>
> [~lfcnassif] observed on TIKA-2428 that a corrupt file caused a permanent 
> hang for the EMFParser.  Files can be corrupted for various reasons.  We can 
> add some optional code to let people experiment with running Tika against 
> randomly corrupted versions of the files in our test suite.  I suspect that 
> this will unearth too many errors to start to be run on a regular basis.
> Let's at least add some code in tika-parsers to let devs run the tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to