[ 
https://issues.apache.org/jira/browse/PDFBOX-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241334#comment-14241334
 ] 

adam brin commented on PDFBOX-1886:
-----------------------------------

Tilman,
  I think this may be marking as something equivalent to "fixed but watch."  I 
can 100% confirm that it was an issue as of Feb '14, and pre-Feb '14, but 
cannot reproduce it in Adobe Acrobat ( 10.1.13 ) but we could in earlier 
versions of Acrobat. Our test was to search for words in the document (second 
page) and make sure that they were discoverable. More details on our internal 
issue are here (https://dev.tdar.org/jira/browse/TDAR-3929) , but the documents 
require a registration (Free) and I wanted to make it easier to reproduce.

Thanks,

adam

> Merge Function strips OCR layer in acrobat
> ------------------------------------------
>
>                 Key: PDFBOX-1886
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1886
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.4
>            Reporter: adam brin
>             Fix For: 2.1.0
>
>         Attachments: cover_page4818280580458469287.pdf, page1.pdf, 
> santa-cruz-flats-project-part-2 (1).pdf
>
>
> We use the PDFMergerUtility to add cover pages to documents automatically. 
> We're finding that when we do so, it strips the OCR data from the source of 
> the merged files.
> {code}
>         PDFMergerUtility merger = new PDFMergerUtility();
>         File outputFile = File.createTempFile();        
> merger.setDestinationStream(new FileOutputStream(outputFile));
>         for (File file : files) {
>             merger.addSource(file);
>         }
>         merger.mergeDocuments();
>         return outputFile;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to