[ 
https://issues.apache.org/jira/browse/COLLECTIONS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908803#comment-14908803
 ] 

Adrian Crum commented on COLLECTIONS-577:
-----------------------------------------

Please don't put pages of text in the issue description. It uses a lot of 
bandwith and makes reading email notifications difficult. The description 
should contain a brief description of the problem, and code examples should go 
in the comments.

> PatriciaTrie bugs when only a few bits change
> ---------------------------------------------
>
>                 Key: COLLECTIONS-577
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-577
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Map
>    Affects Versions: 4.0
>            Reporter: Chris Duncan
>            Priority: Critical
>
> I have a bug report for you, for the class AbstractPatriciaTrie.  It has to 
> do with how you handle bits when they are very close to each other.  For 
> example, some of your methods seem to think that if the only difference 
> between a prefix and a longer string, is a single additional bit, then they 
> are actually the same data.  Or if the only difference is some number of zero 
> bits, then it also thinks they are the same data.  There are also MANY 
> situations where the prefixMap does not return all the strings that start 
> with the prefix.
> Take the following code for example, which shows the bugs happening in a very 
> simple context.
> I create two strings.  The first string is always a prefix of the second 
> string.  Here are the characters I am using:
>     final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
>     final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
>     final char u8000 = Character.toChars(32768)[0]; // U+8000 
> (1000000000000000)
>     final char uffff = Character.toChars(65535)[0]; // U+FFFF 
> (1111111111111111)
>     final char char_b = 'b'; // 1100010
>     final char char_c = 'c'; // 1100011
>     final PatriciaTrie<String> trie = new PatriciaTrie<>();
> And here is a quick example of the trie working correctly, showing how it 
> should work:
>     final String prefixString = "" + char_b;
>     final String longerString = prefixString + char_c;
>     System.out.println(trie.prefixMap(prefixString)); // {b=prefixString, 
> bc=longerString} // correct!
> In the first example, I show that a character who's bits are all zeros is 
> always mishandled:
>     final String prefixString = "" + char_b;
>     final String longerString = prefixString + u0000;
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // only longerString shown... prefixString has 
> disappeared
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but 
> should be 2
>     System.out.println(trie.prefixMap(prefixString)); // {b =longerString} // 
> prefixString should be here, but isn't
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in again, but longerString 
> has now disappeared
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but 
> should be 2
>     System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} // 
> longerString should be here, but isn't
> Next, I show that if the longer string is only 1 bit longer (ignoring zeros) 
> than the prefix string, then the PatriciaTree fails to include it in the 
> prefix map.
> Here the string would look like: 0000000001100011 1000000000000000
>     final String prefixString = "" + char_c;  // you can use any character 
> for the prefix, same results
>     final String longerString = prefixString + u8000;
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // both are in
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but 
> should be 2
>     System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} // 
> longerString should be here, but isn't
> And again, except flipping it so that the prefix ends with a 1, and the 
> longer string starts with 1's:
>     final String prefixString = "" + u0003;
>     final String longerString = prefixString + u8000; // can also use uffff 
> here, same result
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // both are in
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but 
> should be 2
>     System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} // 
> longerString should be here, but isn't
> The class is honestly pretty complex and I wasn't able to completely debug 
> why this is behaving badly, but I believe it is because of how you are 
> handling the "bitIndex" and "bitLength".
> For example, in the method AbstractPatriciaTrie .subtree(), my comments in 
> bold (this definitely isn't the only place with weird handling of bitIndex 
> and lengthInBits):
>     TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int 
> lengthInBits) {
>         TrieEntry<K, V> current = root.left;
>         TrieEntry<K, V> path = root;
>         while(true) {
> // If our bit index has only increased by 1, then our bitIndex will never get 
> to be greater than lengthInBits, we could still start with the prefix, yet we 
> have no way to break out of here with current being set to the longer string.
>             if (current.bitIndex <= path.bitIndex || lengthInBits < 
> current.bitIndex) {
>                 break;
>             }
>             path = current;
>             if (!isBitSet(prefix, offsetInBits + current.bitIndex, 
> offsetInBits + lengthInBits)) {
>                 current = current.left;
>             } else {
>                 current = current.right;
>             }
>         }
>       ...
> Can you also make AbstractPatriciaTrie public, and your other package level 
> methods into protected level, that way I don't have to copy the entire class 
> and subclasse's code out into another class just to extend it?
> thank you,
> Chris Duncan (github user: VEQRYN)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to