[ 
https://issues.apache.org/jira/browse/PDFBOX-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978941#comment-14978941
 ] 

Tilman Hausherr commented on PDFBOX-3068:
-----------------------------------------

Sorry for having pointed to you, it is indeed in our code. For some reason, the 
objects references in the /Info dictionary don't work.

It did work in rev 1633496, which is about a year old.

I get this when running a test, this may or may not be related:
{code}
28.10.2015 19:02:46.073 DEBUG [main] org.apache.pdfbox.pdfparser.COSParser:1267 
- Stop checking xref offsets as at least one couldn't be dereferenced
28.10.2015 19:02:46.110 DEBUG [main] org.apache.pdfbox.pdfparser.COSParser:1277 
- Replaced read xref table with the results of a brute force search
{code}

> Null metadata in some files that had metadata in 1.8.10
> -------------------------------------------------------
>
>                 Key: PDFBOX-3068
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3068
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tim Allison
>              Labels: regression
>             Fix For: 2.0.0
>
>         Attachments: NZAZKTQYKDD2HSBCSJJN6XSEA4KJEONU
>
>
> Tilman's observation on 'Microsoft' below revealed 1) that we should use our 
> BodyContentHandler so that title metadata doesn't slip into the body content 
> and 2) the title and all metadata values from PDDocumentInformation is null 
> for at least: NZ/NZAZKTQYKDD2HSBCSJJN6XSEA4KJEONU
> {code}
>         Path p = Paths.get("..NZAZKTQYKDD2HSBCSJJN6XSEA4KJEONU");
>         PDDocument d = PDDocument.load(p.toFile());
>         assertNull(d.getDocumentInformation().getTitle());
>         assertEquals(8, d.getDocumentInformation().getMetadataKeys().size());
> {code} 
> Manually reviewing a handful of documents in the 
> metadata/metadata_value_count_diffs.csv file 
> [here|https://github.com/tballison/share/blob/master/pdfbox_comparisons/pdfbox_1_8_10V2_0_20151023.zip],
>  this looks to be quite pervasive...unless I'm botching the right way to load 
> the documents and metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to