[
https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039991#comment-13039991
]
Roland Quast commented on PDFBOX-1018:
--------------------------------------
Hope you enjoyed my little presentation :-)
I tried jai_imageio.jar as you described and it worked perfectly! Previously I
tried jai_core.jar and jai_codec.jar which didn't work. As I mentioned in the
video, someone told me that JAI is required to get these PDF files to read, but
I didn't know it was jai_imageio.jar.
There are still a few problems though:
1. It shouldn't fail silently. It should throw an exception. It is hard to unit
test if you can't catch an exception and you have to assert that the output
isn't a white image.
2. The message it throws should be meaningful, not a "null" message on
something that is hard to understand.
3. It should mention that you have to use jai_imageio.jar at least in that
warning message, or somewhere in the main pages of the PDFBox site.
4. The jai license is not exactly a commercially friendly license. One of the
great advantages of using PDFBox is that it uses an Apache license.
5. The jai_imageio.jar contains native code which requires a different jar for
each platform. For instance, our app support windows mac and linux. We'd have
to use some kind of jar loader to pick the right imageio jar... and that is
very difficult.
6. The majority of scanners, when they scan in black and white, will use that
codec for PDF.
Having said all of that, what about the chance of using Commons Sanselan
(another apache project) which already includes a decoder for that format? The
advantage of Sanselan is that it is pure java code (not a native binary), it is
Apache licensed and it is stable and mature. I am also assuming it could be
distributed with PDFBox at one stage if required.
If you feel like it would take too much time, please let me know and I can try
hack up something that works with Sanselan.
> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
> Key: PDFBOX-1018
> URL: https://issues.apache.org/jira/browse/PDFBOX-1018
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
> Environment: JDK 1.6.0_22
> Reporter: Roland Quast
> Assignee: Andreas Lehmkühler
> Priority: Critical
> Labels: pdfbox
> Attachments: BlackAndWhiteBug.java, ColorWorks.java,
> PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am
> attempting to conclusively prove that this is an issue, and it needs to be
> attended to since all past tickets regarding this bug have been marked
> invalid.
> I have attached a video showing very basic code that will reproduce the
> issue. I have also attached the code that causes the issue, as well as a PDF
> file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached
> black and white pdf file), the following message is displayed, and the
> contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke
> process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent
> of our customer's PDF files (from different scanners) will not read because
> of this issue. This is a complete show stopper, and I'd be more than happy to
> help in any way I could to resolve it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira