RFC822 messages not parsed
--------------------------
Key: TIKA-461
URL: https://issues.apache.org/jira/browse/TIKA-461
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.7
Reporter: Joshua Turner
Presented with an RFC822 message exported from Thunderbird, AutodetectParser
produces an empty body, and a Metadata containing only one key-value pair:
"Content-Type=message/rfc822". Directly calling MboxParser likewise gives an
empty body, but with two metadata pairs: "Content-Encoding=us-ascii
Content-Type=application/mbox".
A quick peek at the source of MboxParser shows that the implementation is
pretty naive. If the wiring can be sorted out, something like Apache James'
mime4j might be a better bet.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.