[
https://issues.apache.org/jira/browse/PDFBOX-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501369#comment-14501369
]
Tilman Hausherr edited comment on PDFBOX-2694 at 4/25/15 6:22 PM:
------------------------------------------------------------------
I ran another test with the digitalcorpora set of 250000 pdfs and there were no
unchecked exceptions. My suggestion is what we mention twelvemonkeys on
https://pdfbox.apache.org/2.0/dependencies.html as an optional dependency. The
following text could be included below the text "To write TIFF images a JAI
ImageIO Core library will be needed":
{quote}
For better support of embedded JPEG files in PDFBox 2.0, use TwelveMonkeys
ImageIO https://github.com/haraldk/TwelveMonkeys , version 3.1.0 or higher.
{code}
<dependency>
<groupId>com.twelvemonkeys.imageio</groupId>
<artifactId>imageio-jpeg</artifactId>
<version>3.1.0</version>
</dependency>
{code}
Note that JDK 1.7 is required for twelvemonkeys, while PDFBox 2.0 requires only
JDK 1.6.
{quote}
was (Author: tilman):
I ran another test with the digitalcorpora set of 250000 pdfs and there were no
unchecked exceptions. My suggestion is what we mention twelvemonkeys on
https://pdfbox.apache.org/2.0/dependencies.html as an optional dependency. The
following text could be included below the text "To write TIFF images a JAI
ImageIO Core library will be needed":
{quote}
For better support of embedded JPEG files, use TwelveMonkeys ImageIO
https://github.com/haraldk/TwelveMonkeys , version 3.1.0 or higher.
{code}
<dependency>
<groupId>com.twelvemonkeys.imageio</groupId>
<artifactId>imageio-jpeg</artifactId>
<version>3.1.0</version>
</dependency>
{code}
{quote}
> Evaluate twelvemonkeys for JPEG
> -------------------------------
>
> Key: PDFBOX-2694
> URL: https://issues.apache.org/jira/browse/PDFBOX-2694
> Project: PDFBox
> Issue Type: Task
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Priority: Minor
> Labels: jpeg, twelvemonkeys
> Attachments: 176936-p154-2.jpg, 176936-p154.pdf, 485945.pdf,
> 573636.pdf, DCTFilter.java
>
>
> While working on PDFBOX-2128 I decided to try twelvemonkeys for JPEG reading
> and the first impression is excellent. It seems that the author is making a
> big effort in handling even the most broken JPEG files (similar to what we do
> with PDFs). This issue is to collect problem files and discuss all
> experiences and decide whether we should bundle twelvemonkeys with PDFBox or
> rather just recommend it as an optional solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]