[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915708#action_12915708
]
Julien Nioche commented on TIKA-461:
------------------------------------
Nick,
Thanks for taking the time to review my patch.
bq. It'd probably be good to see some more tests with it. For now, just
checking your basic message should be fine, but I'd suggest we also try to get
an email with plain text, html, images and similar in to check the more complex
bits.
Agreed
bq. In terms of the nested parser, I'm tempted to say we do something so that
plain text comes out without any extra work needed. Anything else gets handled
via a Parser fetched from the ParseContext if required, much as we're doing for
container formats like zip, .docx etc. That way, you can throw a simple email
at it and get the text, but the rest of the parts are available if you want them
I hadn't noticed that you've added org.apache.tika.extractor, seems an elegant
way of doing. Will have a closer look and see how I can leverage it in
RFC822Parser
bq. Also, the james jars need to be listed in the tika bundle pom so they get
properly included
Ok, did not know about that. Thanks
> RFC822 messages not parsed
> --------------------------
>
> Key: TIKA-461
> URL: https://issues.apache.org/jira/browse/TIKA-461
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 0.7
> Reporter: Joshua Turner
> Assignee: Julien Nioche
> Attachments: TIKA-461.patch
>
>
> Presented with an RFC822 message exported from Thunderbird, AutodetectParser
> produces an empty body, and a Metadata containing only one key-value pair:
> "Content-Type=message/rfc822". Directly calling MboxParser likewise gives an
> empty body, but with two metadata pairs: "Content-Encoding=us-ascii
> Content-Type=application/mbox".
> A quick peek at the source of MboxParser shows that the implementation is
> pretty naive. If the wiring can be sorted out, something like Apache James'
> mime4j might be a better bet.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.