Tim Allison created TIKA-1976:
---------------------------------

             Summary: Add more robust date parsing fallbacks for RFC822 parser
                 Key: TIKA-1976
                 URL: https://issues.apache.org/jira/browse/TIKA-1976
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Tim Allison
            Priority: Minor


On TIKA-1970, [[email protected]], reported that the 
RFC822Parser was not parsing a date in text file created by Mac Mail.  For 
kicks, I ran the RFC822Parser against roughly 29k files in our regression 
corpus identified as rfc822 by either Tika, file or Droid, and I found that 
~3700 files (~13%) had dates that weren't parseable. 

Upon first look, there seem to be a few major patterns that will fix most of 
the problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to