[
https://issues.apache.org/jira/browse/TIKA-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020622#comment-13020622
]
Benjamin Douglas commented on TIKA-640:
---------------------------------------
Per TIKA-461, a patch was recently made to trunk to increase the limit to
10,000 characters as 1,000 was too restrictive. The problem with setting it to
unlimited (-1 as you show in the example) is that, because of the nature of
mime4j, all of header data is read into a single String. The RFC does not put
any limit on how many characters can go into a header, so this could
potentially be very large. As far as I understand the goals of the Tika
library, it should allow arbitrarily large files and thus uses a streaming
model. Since headers cannot be streamed with mime4j, some artificial limit must
be set to prevent taking up too much heap space.
> RFC822Parser should configure Mime4j not to fail reading mails containing
> more than 1000 chars in one headers text (even if folded)
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-640
> URL: https://issues.apache.org/jira/browse/TIKA-640
> Project: Tika
> Issue Type: Wish
> Components: parser
> Affects Versions: 0.9
> Environment: All
> Reporter: Jens Wilmer
> Labels: mail, rfc822parser
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> Standard configuration of Mime4j accepts only 1000 characters per line and
> 1000 charackters per header. The streaming approach of tika should not need
> theese limitations, an exception is being thrown and none of the data read is
> available.
> Solution:
> Replace all occurences of:
> Parser parser = new RFC822Parser();
> by:
> MimeEntityConfig config = new MimeEntityConfig();
> config.setMaxLineLen(-1);
> config.setMaxContentLen(-1);
> Parser parser = new RFC822Parser(config);
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira