[
https://issues.apache.org/jira/browse/TIKA-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284699#comment-15284699
]
Tim Allison commented on TIKA-1970:
-----------------------------------
This looks to be a bug in the underlying James library.
{noformat}
ParsedField parsedField = LenientFieldParser.getParser().parse(
field, DecodeMonitor.SILENT);
...
DateTimeField dateField = (DateTimeField) parsedField;
{noformat}
{{dateField.getDate()}} is returning {{null}}, which suggests that it can't
read a date of format: {{16 May 2016 at 09:30:32 GMT+1}}
We're using the latest version of James, but that dates back to 2012.
Some options:
1) fix this in james (unlikely given lack of activity??)
2) add our own date parser
3) find another rfc822 parser that is more actively maintained (??)
> Date not extracted from email saved as plain txt
> ------------------------------------------------
>
> Key: TIKA-1970
> URL: https://issues.apache.org/jira/browse/TIKA-1970
> Project: Tika
> Issue Type: Bug
> Components: metadata
> Affects Versions: 1.14
> Environment: Debian Linux Jessie
> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> Mac OS X Mail
> Reporter: Philipp Steinkrueger
> Priority: Minor
> Attachments: Testemail-date.eml, Testemail-nodate.txt
>
>
> I have two email testfiles:
> (1) A file that has been created by using "save as" in Mac Mail (this creates
> a .txt file)
> (2) A file that has been created by dragging an email from Mac Mail to the
> Desktop (this creates an .eml file)
> If I feed the files with
> curl -T filename http://localhost:9998/detect/stream
> I get the response "message/rfc822" for both files.
> If I run
> curl -T filename http://localhost:9998/meta
> I get the metadata, but in the case of (1) I do not get the DATE extracted,
> while in case (2) I do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)