[
https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020938#comment-18020938
]
David Pilato commented on TIKA-3629:
------------------------------------
Found again my TODO about this. Should we close this issue as a WONTFIX
[~tallison]?
> Keywords are not extracted anymore from PDF documents
> -----------------------------------------------------
>
> Key: TIKA-3629
> URL: https://issues.apache.org/jira/browse/TIKA-3629
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 2.2.0
> Reporter: David Pilato
> Priority: Major
>
> Hey
>
> I'm seeing some changes (regressions?) in [Tika 2.2.0 (from
> 2.1.0)|https://github.com/dadoonet/fscrawler/pull/1330].
> When extracting content from Office files (docs, doc, rtf), {{cp:subject}} is
> not generated anymore. I'm not using this value anyway so that's may be not
> an issue at all but a feature ;)
>
> But, for PDF documents, I'm not able to get anymore the keywords for the
> document.
> I was reading the keywords with {{Office.KEYWORDS}} but it's now null and I
> don't see this change documented in the wiki.
>
> Is that expected or a bug?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)