[
https://issues.apache.org/jira/browse/COLLECTIONS-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831896#comment-16831896
]
Rohan Padhye commented on COLLECTIONS-714:
------------------------------------------
Upon further investigation I noticed that the problem lies in the way
PatriciaTrie uses the results of StringKeyAnalyzer.
The insertion of entry with key "x\u0000" when key "x" already exists, hits
this path in AbstractPatriciaTrie.put():
{code:java}
} else if (KeyAnalyzer.isEqualBitKey(bitIndex)) {
// This is a very special and rare case.
/* REPLACE OLD KEY+VALUE */
if (found != root) { // NOPMD
incrementModCount();
return found.setKeyValue(key, value);
}
}
{code}
I'm not sure what the "very special and rare case" is that the original
developer intended, but it incorrectly overwrites the found key "x" with new
key "x\u0000".
Why is the above branch executed? Because the call to
AbstractPatriciaTrie.bitIndex("x", "x\u0000") returns EQUAL_BIT_KEY, which
seems incorrect. The bug may be rooted in the way StringKeyAnalyzer.bitIndex()
handles cases where index1 >= endIndex1: with the placeholder value "0", which
is actually a valid character in Java.
I cannot think of a simple fix at the top of my head, since the entire
XOR-based algorithm seems to rely on the assumption that two non-equal keys
must have a bit-index where their corresponding bits differ. This algorithm
does not consider the fact that two keys can be non-equal simply because they
have different lengths.
> PatriciaTrie ignores trailing null characters in keys
> -----------------------------------------------------
>
> Key: COLLECTIONS-714
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-714
> Project: Commons Collections
> Issue Type: Bug
> Components: Collection, Map
> Affects Versions: 4.3
> Reporter: Rohan Padhye
> Priority: Critical
>
> In Java, strings are not null terminated. The string "x" (of length = 1 char)
> is different from the string "x\u0000" (of length = 2 chars). However,
> PatriciaTrie does not seem to distinguish between these strings.
> To reproduce:
> {code:java}
> public void testNullTerminatedKey1() {
> Map<String, Integer> map = new HashMap<>();
> map.put("x", 0); // key of length 1
> map.put("x\u0000", 1); // key of length 2
> map.put("x\u0000y", 2); // key of length 3
> Assert.assertEquals(3, map.size()); // ok, 3 distinct keys
> PatriciaTrie<Integer> trie = new PatriciaTrie<>(map);
> Assert.assertEquals(3, trie.size()); // fail; actual=2
> }{code}
> In the above example, the resulting trie has only two keys: "x\u0000" and
> "x\u0000y". The key "x" gets overwritten. Here is another way to repro the
> bug:
> {code:java}
> public void testNullTerminatedKey2() {
> PatriciaTrie<Integer> trie = new PatriciaTrie<>();
> trie.put("x", 0);
> Assert.assertTrue(trie.containsKey("x")); // ok
> trie.put("x\u0000", 1);
> Assert.assertTrue(trie.containsKey("x")); // fail
> }
> {code}
> In the above example, the key "x" suddenly disappears when an entry with key
> "x\u0000" is inserted.
> The PatriciaTrie docs do not mention anything about null-terminated strings.
> In general, I believe this also breaks the JDK Map contract since the keys
> "x".equals("x\u0000") is false.
> This bug was found automatically using JQF:
> [https://github.com/rohanpadhye/jqf].
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)