I had the latest revision. When I compiled everything from the command-line and started PDFReader from there, everything looked fine. Due to bad experiences with the Eclipse Maven plug-ins, I set up the PDFBox project by hand. And in that case I get the characters on top of each other. I don't know, yet, where the difference is.
While going through this experiment, I noticed that it's currently not that easy to compile PDFBox and just run PDFReader without setting up a batch script first with the right classpath. The instructions on [1] are also incorrect, as PDFBox doesn't have a ClassPath manifest entry (which is good really). I guess we could add additional Ant targets to run the various command-line tools. Batik does that. That would make it easier for people to evaluate PDFBox quickly. Maybe I'll have time to look into this at some point (no promises just yet). [1] http://pdfbox.apache.org/commandlineutilities/PDFReader.html On 01.10.2010 17:01:34 Andreas Lehmkühler (JIRA) wrote: > What version are you using? The latest trunk version (1003396) includes > a fix for the extraction/rendering of text and one of the key issues > was the handling of the TJ operator. See PDFBOX-828 for further details. > After applying your proposed patch to the latest trunk everything seems > to be fine. I can't see any problem with the TJ operator. I'm attaching > the result of PDFToImage. Jeremias Maerki
