[
https://issues.apache.org/jira/browse/PDFBOX-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552868#comment-14552868
]
Tilman Hausherr commented on PDFBOX-2809:
-----------------------------------------
You're probably not using the latest version, which is 1.8.9. Here's some code
for you:
{code}
File file = new File("SC1401_RPEP0040.pdf");
PDDocument pdfDocument = PDDocument.loadNonSeq(file, null);
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
String text = stripper.getText(pdfDocument);
System.out.println(text);
pdfDocument.close();
{code}
I think your error description meant that the header was at the wrong position.
Here's the output now, with the "sort" option:
{quote}
DNIT - Sistema de Custos Rodoviários SICRO2
(Valores em R$) Preço Unitário dos Equipamentos RPEP0040
SC - Santa Catarina Pesquisa: 20/01/2014
Código Equipamento Aquisição Improdutivo Operativo
A001 Componente para equipamento : Caterpillar : D8T - trator 2.127.987,73
9,4413 365,1711
de esteiras
A002 Componente para equipamento : Caterpillar : R-8 - 165.411,53
0,0000 17,0006
escarificador
A003 Componente para veículos : Mercedes Benz : ATEGO 1319 178.200,00
11,5123 77,8805
- chassis 7,1 t (p/ caminhão)
{quote}
So DNIT is now at the top. I hope that is what you wanted :-)
> Error trying to read the header of all the pages of a document
> --------------------------------------------------------------
>
> Key: PDFBOX-2809
> URL: https://issues.apache.org/jira/browse/PDFBOX-2809
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Environment: Java
> Reporter: João Gabriel Ferrazza Dias
> Priority: Critical
> Attachments: SC1401_RPEP0040.pdf, Test.java
>
>
> I am trying to read a documento with a lot of pages,
> and the header of all pages came as another text.
> I am sending the test class and the document i am trying to read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]