[
https://issues.apache.org/jira/browse/TIKA-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362488#comment-14362488
]
Tyler Palsulich commented on TIKA-1161:
---------------------------------------
I'm seeing the following metadata (no date field) with Tika 1.8-SNAPSHOT:
{code}
Author: Luke Bo'sher
Content-Length: 576757
Content-Type: application/pdf
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.pdf.PDFParser
access_permission:assemble_document: true
access_permission:can_modify: true
access_permission:can_print: true
access_permission:can_print_degraded: true
access_permission:extract_content: true
access_permission:extract_for_accessibility: true
access_permission:fill_in_form: true
access_permission:modify_annotations: true
creator: Luke Bo'sher
dc:creator: Luke Bo'sher
dc:format: application/pdf; version=1.3
dc:title: Microsoft Word - WorkChoices Submission.doc
meta:author: Luke Bo'sher
pdf:PDFVersion: 1.3
pdf:encrypted: false
producer: Mac OS X 10.4.7 Quartz PDFContext
resourceName: WF_16_Youth_Coalition.pdf
title: Microsoft Word - WorkChoices Submission.doc
xmp:CreatorTool: Word
xmpTPg:NPages: 20
{code}
> Dates incorrectly extracted from PDF
> ------------------------------------
>
> Key: TIKA-1161
> URL: https://issues.apache.org/jira/browse/TIKA-1161
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.4
> Environment: Windows 7 64bit, JDK 1.7
> Reporter: Nicolas Guillaumin
> Priority: Minor
> Labels: pdf
> Attachments: WF_16_Youth_Coalition.pdf
>
>
> Tika incorrectly extracts the date on the attached PDF to
> 5034-09-24T14:03:00Z, whereas the actual date on the PDF seems to be
> 2007-03-01 10:58:57 according to FoxIt reader.
> Interestingly PDFBox 1.8.2 is extracting the correct date as well (When using
> the PDFDebugger tool)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)