[
https://issues.apache.org/jira/browse/TIKA-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986164#comment-17986164
]
Hudson commented on TIKA-4442:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-branch_3x-jdk11 #2096 (See
[https://ci-builds.apache.org/job/Tika/job/tika-branch_3x-jdk11/2096/])
TIKA-4442: fix javadoc (tilman:
[https://github.com/apache/tika/commit/77b1d1e06f3f32006fe236e8efdde67ea37eab3a])
* (edit) tika-core/src/main/java/org/apache/tika/metadata/Metadata.java
> PDFParser does not list all metadata extracted by PDFBox
> --------------------------------------------------------
>
> Key: TIKA-4442
> URL: https://issues.apache.org/jira/browse/TIKA-4442
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 3.2.0
> Environment: * Docker container based on python:3-slim
> * Debian 12.11
> * Python 3.13.5
> * openjdk 17.0.15 2025-04-15
> * tika-server-standard-3.2.0.jar
> * pdfbox-app-3.0.5.jar
> * PyPDF 5.6.1
> Reporter: Peter Hoogendijk
> Assignee: Tilman Hausherr
> Priority: Major
> Labels: xmp
> Fix For: 4.0.0, 3.2.2
>
> Attachments: lorem-ipsum.pdf, lorem-ipsum.xml
>
>
> While using Apache Tika to extract metadata from PDF files, I found the
> following XMP metadata entries to be missing:
> * dc:identifier
> * dc:language
> * dc:publisher
> * dc:relation
> * dc:source
> * dc:type
> Python (PyPDF2) and PDFBox (as used by Tika's PDFParser) do show these XMP
> metadata entries, so I expected Apache Tika to also extract these entries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)