[ 
https://issues.apache.org/jira/browse/TIKA-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020622#comment-13020622
 ] 

Benjamin Douglas commented on TIKA-640:
---------------------------------------

Per TIKA-461, a patch was recently made to trunk to increase the limit to 
10,000 characters as 1,000 was too restrictive. The problem with setting it to 
unlimited (-1 as you show in the example) is that, because of the nature of 
mime4j, all of header data is read into a single String. The RFC does not put 
any limit on how many characters can go into a header, so this could 
potentially be very large. As far as I understand the goals of the Tika 
library, it should allow arbitrarily large files and thus uses a streaming 
model. Since headers cannot be streamed with mime4j, some artificial limit must 
be set to prevent taking up too much heap space.

> RFC822Parser should configure Mime4j not to fail reading mails containing 
> more than 1000 chars in one headers text (even if folded)
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-640
>                 URL: https://issues.apache.org/jira/browse/TIKA-640
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 0.9
>         Environment: All
>            Reporter: Jens Wilmer
>              Labels: mail, rfc822parser
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Standard configuration of Mime4j accepts only 1000 characters per line and 
> 1000 charackters per header. The streaming approach of tika should not need 
> theese limitations, an exception is being thrown and none of the data read is 
> available.
> Solution:
> Replace all occurences of:
> Parser parser = new RFC822Parser();
> by:
> MimeEntityConfig config = new MimeEntityConfig();
> config.setMaxLineLen(-1);
> config.setMaxContentLen(-1);
> Parser parser = new RFC822Parser(config);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to