[
https://issues.apache.org/jira/browse/COLLECTIONS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908803#comment-14908803
]
Adrian Crum commented on COLLECTIONS-577:
-----------------------------------------
Please don't put pages of text in the issue description. It uses a lot of
bandwith and makes reading email notifications difficult. The description
should contain a brief description of the problem, and code examples should go
in the comments.
> PatriciaTrie bugs when only a few bits change
> ---------------------------------------------
>
> Key: COLLECTIONS-577
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-577
> Project: Commons Collections
> Issue Type: Bug
> Components: Map
> Affects Versions: 4.0
> Reporter: Chris Duncan
> Priority: Critical
>
> I have a bug report for you, for the class AbstractPatriciaTrie. It has to
> do with how you handle bits when they are very close to each other. For
> example, some of your methods seem to think that if the only difference
> between a prefix and a longer string, is a single additional bit, then they
> are actually the same data. Or if the only difference is some number of zero
> bits, then it also thinks they are the same data. There are also MANY
> situations where the prefixMap does not return all the strings that start
> with the prefix.
> Take the following code for example, which shows the bugs happening in a very
> simple context.
> I create two strings. The first string is always a prefix of the second
> string. Here are the characters I am using:
> final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
> final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
> final char u8000 = Character.toChars(32768)[0]; // U+8000
> (1000000000000000)
> final char uffff = Character.toChars(65535)[0]; // U+FFFF
> (1111111111111111)
> final char char_b = 'b'; // 1100010
> final char char_c = 'c'; // 1100011
> final PatriciaTrie<String> trie = new PatriciaTrie<>();
> And here is a quick example of the trie working correctly, showing how it
> should work:
> final String prefixString = "" + char_b;
> final String longerString = prefixString + char_c;
> System.out.println(trie.prefixMap(prefixString)); // {b=prefixString,
> bc=longerString} // correct!
> In the first example, I show that a character who's bits are all zeros is
> always mishandled:
> final String prefixString = "" + char_b;
> final String longerString = prefixString + u0000;
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // only longerString shown... prefixString has
> disappeared
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {b =longerString} //
> prefixString should be here, but isn't
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in again, but longerString
> has now disappeared
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} //
> longerString should be here, but isn't
> Next, I show that if the longer string is only 1 bit longer (ignoring zeros)
> than the prefix string, then the PatriciaTree fails to include it in the
> prefix map.
> Here the string would look like: 0000000001100011 1000000000000000
> final String prefixString = "" + char_c; // you can use any character
> for the prefix, same results
> final String longerString = prefixString + u8000;
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // both are in
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
> longerString should be here, but isn't
> And again, except flipping it so that the prefix ends with a 1, and the
> longer string starts with 1's:
> final String prefixString = "" + u0003;
> final String longerString = prefixString + u8000; // can also use uffff
> here, same result
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // both are in
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
> longerString should be here, but isn't
> The class is honestly pretty complex and I wasn't able to completely debug
> why this is behaving badly, but I believe it is because of how you are
> handling the "bitIndex" and "bitLength".
> For example, in the method AbstractPatriciaTrie .subtree(), my comments in
> bold (this definitely isn't the only place with weird handling of bitIndex
> and lengthInBits):
> TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int
> lengthInBits) {
> TrieEntry<K, V> current = root.left;
> TrieEntry<K, V> path = root;
> while(true) {
> // If our bit index has only increased by 1, then our bitIndex will never get
> to be greater than lengthInBits, we could still start with the prefix, yet we
> have no way to break out of here with current being set to the longer string.
> if (current.bitIndex <= path.bitIndex || lengthInBits <
> current.bitIndex) {
> break;
> }
> path = current;
> if (!isBitSet(prefix, offsetInBits + current.bitIndex,
> offsetInBits + lengthInBits)) {
> current = current.left;
> } else {
> current = current.right;
> }
> }
> ...
> Can you also make AbstractPatriciaTrie public, and your other package level
> methods into protected level, that way I don't have to copy the entire class
> and subclasse's code out into another class just to extend it?
> thank you,
> Chris Duncan (github user: VEQRYN)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)