Hello All, Any suggestions for extracting text from PDF? I have tried pdfbox, but it works nice, however if the pdf is structured, it wont provide good results. For example consider the pdf:
P1 Lorem Ipsum Bla bla P3 Lorem2 Ipsum2 P1 bla bla P2 bla bla bla P2 bla bla bla above P1,2 and 3 are meaningful paragraphs or fields. The pdfbox will convert P1 Lorem Ipsim Bla bla P3 Lorem2 Ipsum2 P1 bla bla which is not useful to me. the unix program pdf2text can convert keeping the text places, but I wanted to ask you guys if you know something better, Best, -C.B.