[jira] [Comment Edited] (TIKA-3629) Keywords are not extracted anymore from PDF documents

Tim Allison (Jira) Mon, 20 Dec 2021 05:38:06 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462615#comment-17462615
 ]


Tim Allison edited comment on TIKA-3629 at 12/20/21, 1:37 PM:
--------------------------------------------------------------

May be caused as part of this?

{noformat}
   * Remove duplicate "subject" metadata keys that were intended
     for backwards compatibility within 1.x only (TIKA-3564).
{noformat}

I'll update the documentation.  As long as you aren't missing information, I 
_think_ we're ok to respin 2.2.1 as is.


was (Author: [email protected]):
May be caused by this?

{noformat}
   * Remove duplicate "subject" metadata keys that were intended
     for backwards compatibility within 1.x only (TIKA-3564).
{noformat}

I'll update the documentation.  As long as you aren't missing information, I 
_think_ we're ok to respin 2.2.1 as is.

> Keywords are not extracted anymore from PDF documents
> -----------------------------------------------------
>
>                 Key: TIKA-3629
>                 URL: https://issues.apache.org/jira/browse/TIKA-3629
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.2.0
>            Reporter: David Pilato
>            Priority: Major
>
> Hey
>  
> I'm seeing some changes (regressions?) in [Tika 2.2.0 (from 
> 2.1.0)|https://github.com/dadoonet/fscrawler/pull/1330].
> When extracting content from Office files (docs, doc, rtf), {{cp:subject}} is 
> not generated anymore. I'm not using this value anyway so that's may be not 
> an issue at all but a feature ;) 
>  
> But, for PDF documents, I'm not able to get anymore the keywords for the 
> document.
> I was reading the keywords with {{Office.KEYWORDS}} but it's now null and I 
> don't see this change documented in the wiki.
>  
> Is that expected or a bug?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (TIKA-3629) Keywords are not extracted anymore from PDF documents

Reply via email to