[ 
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893426#comment-13893426
 ] 

Tim Allison commented on TIKA-1232:
-----------------------------------

[~anjackson], y, I'd like to add your code if others agree that it would be 
useful.  No need for a formal patch.  I'll take your github code nearly 
directly.

Two items:
  1) Would you be interested in contributing your extension-level extraction 
code to PDFBox if it doesn't currently exist there (I haven't checked but I 
assume you wouldn't reinvent the wheel).  I think that would be more at home 
within PDFBox.
  2) How much testing have you done for potential exceptions thrown by PDFBox 
on pdfs in the wild when grabbing this new metadata (cf. null pointer checks 
around date parsing in current metadata code and TIKA-1226, TIKA-1232, 
TIKA-1233)?

Thank you, again.

> Add PDF version to PDFParser output
> -----------------------------------
>
>                 Key: TIKA-1232
>                 URL: https://issues.apache.org/jira/browse/TIKA-1232
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.5
>         Environment: JDK6
>            Reporter: William Palmer
>            Assignee: Tim Allison
>            Priority: Minor
>         Attachments: pdfversion.patch
>
>
> I'd like to identify the PDF version of files, this is not currently reported 
> by the PDFParser although the information is available via PDFBox.  I have 
> attached a patch that adds the format version to the Metadata object.
> However, I am not familiar enough with the Tika source to know if an 
> alternative metadata key should be used, or this new one added.
> Comments welcome.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to