[jira] [Created] (TIKA-3629) Keywords are not extracted anymore from PDF documents

David Pilato (Jira) Mon, 20 Dec 2021 03:23:08 -0800

David Pilato created TIKA-3629:
----------------------------------

             Summary: Keywords are not extracted anymore from PDF documents
                 Key: TIKA-3629
                 URL: https://issues.apache.org/jira/browse/TIKA-3629
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 2.2.0
            Reporter: David Pilato



Hey

 

I'm seeing some changes (regressions?) in [Tika 2.2.0 (from 
2.1.0)|https://github.com/dadoonet/fscrawler/pull/1330].

When extracting content from Office files (docs, doc, rtf), {{cp:subject}} is 
not generated anymore. I'm not using this value anyway so that's may be not an 
issue at all but a feature ;) 

 

But, for PDF documents, I'm not able to get anymore the keywords for the 
document.

I was reading the keywords with {{Office.KEYWORDS}} but it's now null and I 
don't see this change documented in the wiki.

 

Is that expected or a bug?

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (TIKA-3629) Keywords are not extracted anymore from PDF documents

Reply via email to