Alex Herbert created CODEC-267:
----------------------------------
Summary: MurmurHash3.hash32() does not process trailing bytes as
unsigned
Key: CODEC-267
URL: https://issues.apache.org/jira/browse/CODEC-267
Project: Commons Codec
Issue Type: Bug
Affects Versions: 1.13
Reporter: Claude Warren
Assignee: Alex Herbert
The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or 3
that are negative are not masked to unsigned leading to an error.
This test passes using data generated from the Python mmh3 library which calls
the MurmurHash3 c++ code (modified for Python):
{code:java}
/**
* Test to demonstrate the errors in
* {@link MurmurHash3#hash32(byte[], int, int, int)}
* if the final 1, 2, or 3 bytes are negative.
*/
@Test
public void testHash32With1TrailingSignedByteIsInvalid() {
// Generate test data:
// import mmh3
// import numpy as np
// mmh3.hash(np.uint8([-1]))
// mmh3.hash(np.uint8([0, -1]))
// mmh3.hash(np.uint8([0, 0, -1]))
// mmh3.hash(np.uint8([-1, 0]))
// mmh3.hash(np.uint8([-1, 0, 0]))
// mmh3.hash(np.uint8([0, -1, 0]))
Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, 1,
0));
Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1},
0, 1, 0));
Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, -1},
0, 1, 0));
Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, 0},
0, 1, 0));
Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0,
0}, 0, 1, 0));
Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1,
0}, 0, 1, 0));
}
{code}
This test passes with {{assertEquals}} when the code is fixed to apply masking
to the final 3 bytes:
{code:java}
case 3:
k1 ^= (data[index + 2] & 0xff) << 16;
case 2:
k1 ^= (data[index + 1] & 0xff) << 8;
case 1:
k1 ^= (data[index] & 0xff);
{code}
Fixing this error will be a behavioural change.
It is recommended to leave this method alone and implement a new hash32x86
method that should match the {{MurmurHash3_x86_32}} method from the c++ source
code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)