Tim Allison created TIKA-1976:
---------------------------------
Summary: Add more robust date parsing fallbacks for RFC822 parser
Key: TIKA-1976
URL: https://issues.apache.org/jira/browse/TIKA-1976
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Tim Allison
Priority: Minor
On TIKA-1970, [[email protected]], reported that the
RFC822Parser was not parsing a date in text file created by Mac Mail. For
kicks, I ran the RFC822Parser against roughly 29k files in our regression
corpus identified as rfc822 by either Tika, file or Droid, and I found that
~3700 files (~13%) had dates that weren't parseable.
Upon first look, there seem to be a few major patterns that will fix most of
the problems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)