[ 
https://issues.apache.org/jira/browse/PDFBOX-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552868#comment-14552868
 ] 

Tilman Hausherr commented on PDFBOX-2809:
-----------------------------------------

You're probably not using the latest version, which is 1.8.9. Here's some code 
for you:
{code}
        File file = new File("SC1401_RPEP0040.pdf");

        PDDocument pdfDocument = PDDocument.loadNonSeq(file, null);

        PDFTextStripper stripper = new PDFTextStripper();
        stripper.setSortByPosition(true);
        String text = stripper.getText(pdfDocument);

        System.out.println(text);

        pdfDocument.close();
{code}
I think your error description meant that the header was at the wrong position. 
Here's the output now, with the "sort" option:
{quote}
DNIT - Sistema de Custos Rodoviários SICRO2
(Valores em R$) Preço Unitário dos Equipamentos RPEP0040
SC - Santa Catarina Pesquisa: 20/01/2014
Código Equipamento Aquisição Improdutivo Operativo
A001 Componente para equipamento : Caterpillar : D8T -  trator  2.127.987,73    
      9,4413        365,1711
de esteiras
A002 Componente para equipamento : Caterpillar : R-8 -    165.411,53          
0,0000         17,0006
escarificador
A003 Componente para veículos : Mercedes Benz : ATEGO 1319    178.200,00        
 11,5123         77,8805
-  chassis 7,1 t (p/ caminhão)
{quote}

So DNIT is now at the top. I hope that is what you wanted :-)



> Error trying to read the header of all the pages of a document
> --------------------------------------------------------------
>
>                 Key: PDFBOX-2809
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2809
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>         Environment: Java
>            Reporter: João Gabriel Ferrazza Dias
>            Priority: Critical
>         Attachments: SC1401_RPEP0040.pdf, Test.java
>
>
> I am trying to read a documento with a lot of pages,
> and the header of all pages came as another text.
> I am sending the  test class and the document i am trying to read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to