Hi all,
The changes I made won't help because tika passes the "ocrImageQuality"
value 1 to ImageIOUtil.write().
It may be possible to pass the build tests on jdk11 and later by
changing the configuration. The "ocrImageQuality" in ParserConfig.java
should be 0, not 1 for "png".
I tested this with PDFToImage and the result png files are now
identical. But this could be different on other jdk versions.
So in the long run you should check size and pixel equality, like done
in PDFBox TestPDFToImage.java:
/**
* Create an image; the part between the smaller and the larger
image is painted black, the rest
* in white
*
* @param minWidth width of the smaller image
* @param minHeight width of the smaller image
* @param maxWidth height of the larger image
* @param maxHeight height of the larger image
*
* @return
*/
private BufferedImage createEmptyDiffImage(int minWidth, int
minHeight, int maxWidth, int maxHeight)
{
BufferedImage bim3 = new BufferedImage(maxWidth, maxHeight,
BufferedImage.TYPE_INT_RGB);
Graphics graphics = bim3.getGraphics();
if (minWidth != maxWidth || minHeight != maxHeight)
{
graphics.setColor(Color.BLACK);
graphics.fillRect(0, 0, maxWidth, maxHeight);
}
graphics.setColor(Color.WHITE);
graphics.fillRect(0, 0, minWidth, minHeight);
graphics.dispose();
return bim3;
}
/**
* Get the difference between two images, identical colors are set
to white, differences are
* xored, the highest bit of each color is reset to avoid colors
that are too light
*
* @param bim1
* @param bim2
* @return If the images are different, the function returns a diff
image If the images are
* identical, the function returns null If the size is different, a
black border on the botton
* and the right is created
*
* @throws IOException
*/
private BufferedImage diffImages(BufferedImage bim1, BufferedImage
bim2) throws IOException
{
int minWidth = Math.min(bim1.getWidth(), bim2.getWidth());
int minHeight = Math.min(bim1.getHeight(), bim2.getHeight());
int maxWidth = Math.max(bim1.getWidth(), bim2.getWidth());
int maxHeight = Math.max(bim1.getHeight(), bim2.getHeight());
BufferedImage bim3 = null;
if (minWidth != maxWidth || minHeight != maxHeight)
{
bim3 = createEmptyDiffImage(minWidth, minHeight, maxWidth,
maxHeight);
}
for (int x = 0; x < minWidth; ++x)
{
for (int y = 0; y < minHeight; ++y)
{
int rgb1 = bim1.getRGB(x, y);
int rgb2 = bim2.getRGB(x, y);
if (rgb1 != rgb2
// don't bother about small differences
&& (Math.abs((rgb1 & 0xFF) - (rgb2 & 0xFF)) > 3
|| Math.abs(((rgb1 >> 8) & 0xFF) - ((rgb2 >> 8)
& 0xFF)) > 3
|| Math.abs(((rgb1 >> 16) & 0xFF) - ((rgb2 >>
16) & 0xFF)) > 3))
{
if (bim3 == null)
{
bim3 = createEmptyDiffImage(minWidth,
minHeight, maxWidth, maxHeight);
}
int r = Math.abs((rgb1 & 0xFF) - (rgb2 & 0xFF));
int g = Math.abs((rgb1 & 0xFF00) - (rgb2 & 0xFF00));
int b = Math.abs((rgb1 & 0xFF0000) - (rgb2 &
0xFF0000));
bim3.setRGB(x, y, 0xFFFFFF - (r | g | b));
}
else
{
if (bim3 != null)
{
bim3.setRGB(x, y, Color.WHITE.getRGB());
}
}
}
}
return bim3;
}
Tilman
Am 21.09.2019 um 14:45 schrieb Tim Allison:
Thank you, Tilman!
https://issues.apache.org/jira/plugins/servlet/mobile#issue/PDFBOX-4655
We should probably change our unit test to avoid requiring digest equality?
:( Other simple options?
On Fri, Sep 20, 2019 at 11:57 AM Sergey Beryozkin <[email protected]>
wrote:
Is it the message digest signatures of some PDF content ? May be to do
with some MessageDigest enhancements in Java 11 ? I haven't found anything
specific, but this one possible line. Or may be the default hashCode() has
changed, which can also affect the collection hashCode() and the total
digest too
Cheers, Sergey
On Fri, Sep 20, 2019 at 1:43 PM Tim Allison <[email protected]> wrote:
PDFBox Colleagues,
Do you know of any diffs between Java 8 and 11 that would affect the
extraction of images from PDFs? Dan is getting a build failure
because of a hash mismatch.
Thank you.
Best,
Tim
On Fri, Sep 20, 2019 at 8:39 AM Dan Becker <[email protected]> wrote:
When one installs "sudo apt install -y default-jre" on Ubuntu 16.04, the
Tika build will be successful with the following tools:
vagrant@ubuntu-xenial:~/tika$ javac -version
javac 1.8.0_222
vagrant@ubuntu-xenial:~/tika$ java -version
openjdk version "1.8.0_222"
When one installs "sudo apt install -y default-jre" on Ubuntu 18.04, the
Tika build will FAIL with the following tools:
vagrant@ubuntu-bionic:~/tika$ javac -version
javac 11.0.4
vagrant@ubuntu-bionic:~/tika$ java -version
openjdk version "11.0.4" 2019-07-16
When one installs "sudo apt install -y openjdk-8-jdk" on Ubuntu 18.04,
the
Tika build will FAIL with the following tools (Note the compiler has
been
changed to jdk 8, but java has not):
vagrant@ubuntu-bionic:~/tika$ javac -version
javac 1.8.0_222
vagrant@ubuntu-bionic:~/tika$ java -version
openjdk version "11.0.4" 2019-07-16
If you switch the Java version with "echo 2 | sudo update-alternatives
--config java" (Note "2" works for a clean bionic vagrant VM, but
YMMV),
then the Tika build will be successful with the following tools:
vagrant@ubuntu-bionic:~/tika$ javac -version
javac 1.8.0_222
vagrant@ubuntu-bionic:~/tika$ java -version
openjdk version "1.8.0_222"
I suspect that there is some new issue with running Tika under Open JDK
11.0.4. I will continue to look for the root cause of that next week.
Dan
C: 301-524-8899
On Thu, Sep 19, 2019 at 1:00 PM Dan Becker <[email protected]> wrote:
It is a clean checkout and build with no local changes.
I tested against the stock Ubuntu 16.04, and all the tests passed.
The
only difference with the command sequence listed in the first email is
"vagrant init ubuntu/xenial64".
I retested Ubuntu 18.04 on a different host (different version of
vagrant,
etc), and I got the same error, so the problem does "repro".
I will try to debug it further to determine the root cause.
Dan
C: 301-524-8899
On Thu, Sep 19, 2019 at 5:07 AM Nick Burch <[email protected]>
wrote:
On Wed, 18 Sep 2019, Dan Becker wrote:
I am trying to build the master branch from Ubuntu 18.04, but I am
getting
the following error:
[ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed:
1.409 s <<< FAILURE! - in
org.apache.tika.server.UnpackerResourceTest
[ERROR]
testPDFImages(org.apache.tika.server.UnpackerResourceTest) Time
elapsed: 0.366 s <<< FAILURE!
org.junit.ComparisonFailure:
expected:<[7c2f14acbb737672a1245f4ceb50622a]>
but was:<[58b8269d1a584b7e8c1adcb936123923]>
at
org.apache.tika.server.UnpackerResourceTest.testPDFImages(UnpackerResourceTest.java:208)
Have you made any local changes first? Anything that might've been
merged
in locally?
I'm building on Ubuntu 18.04 with Java 11, and the build completes
fine
for me with no errors. Pretty sure some/most of our build servers are
Ubuntu too. So, not sure what's wrong for you...
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]