David Pilato created TIKA-3629:
----------------------------------
Summary: Keywords are not extracted anymore from PDF documents
Key: TIKA-3629
URL: https://issues.apache.org/jira/browse/TIKA-3629
Project: Tika
Issue Type: Bug
Components: core
Affects Versions: 2.2.0
Reporter: David Pilato
Hey
I'm seeing some changes (regressions?) in [Tika 2.2.0 (from
2.1.0)|https://github.com/dadoonet/fscrawler/pull/1330].
When extracting content from Office files (docs, doc, rtf), {{cp:subject}} is
not generated anymore. I'm not using this value anyway so that's may be not an
issue at all but a feature ;)
But, for PDF documents, I'm not able to get anymore the keywords for the
document.
I was reading the keywords with {{Office.KEYWORDS}} but it's now null and I
don't see this change documented in the wiki.
Is that expected or a bug?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)