Sam Stephens created TIKA-3768:
----------------------------------

             Summary: message/rfc822 does not include Headers in extracted text
                 Key: TIKA-3768
                 URL: https://issues.apache.org/jira/browse/TIKA-3768
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 2.4.0
            Reporter: Sam Stephens
         Attachments: email.txt

When running AutoDetectParser on message/rfc822 structured text documents, such 
as the attached [^email.txt], the extracted text does not include any of the 
headers, such as the Subject and From and To lines.

However these lines contain useful text I'd like to be able to extract. I'm 
surprised it's not there based on the include everything bias I saw on 
https://issues.apache.org/jira/browse/TIKA-3710.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to