[
https://issues.apache.org/jira/browse/COLLECTIONS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Duncan updated COLLECTIONS-577:
-------------------------------------
Description:
I have a bug report for you, for the class AbstractPatriciaTrie. It has to do
with how you handle bits when they are very close to each other. For example,
some of your methods seem to think that if the only difference between a prefix
and a longer string, is a single additional bit, then they are actually the
same data. Or if the only difference is some number of zero bits, then it also
thinks they are the same data. There are also MANY situations where the
prefixMap does not return all the strings that start with the prefix.
Take the following code for example, which shows the bugs happening in a very
simple context.
I create two strings. The first string is always a prefix of the second
string. Here are the characters I am using:
final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
final char u8000 = Character.toChars(32768)[0]; // U+8000 (1000000000000000)
final char uffff = Character.toChars(65535)[0]; // U+FFFF (1111111111111111)
final char char_b = 'b'; // 1100010
final char char_c = 'c'; // 1100011
final PatriciaTrie<String> trie = new PatriciaTrie<>();
And here is a quick example of the trie working correctly, showing how it
should work:
final String prefixString = "" + char_b;
final String longerString = prefixString + char_c;
System.out.println(trie.prefixMap(prefixString)); // {b=prefixString,
bc=longerString} // correct!
In the first example, I show that a character who's bits are all zeros is
always mishandled:
final String prefixString = "" + char_b;
final String longerString = prefixString + u0000;
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // only longerString shown... prefixString has
disappeared
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {b =longerString} //
prefixString should be here, but isn't
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in again, but longerString has
now disappeared
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} //
longerString should be here, but isn't
Next, I show that if the longer string is only 1 bit longer (ignoring zeros)
than the prefix string, then the PatriciaTree fails to include it in the prefix
map.
Here the string would look like: 0000000001100011 1000000000000000
final String prefixString = "" + char_c; // you can use any character for
the prefix, same results
final String longerString = prefixString + u8000;
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // both are in
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
longerString should be here, but isn't
And again, except flipping it so that the prefix ends with a 1, and the longer
string starts with 1's:
final String prefixString = "" + u0003;
final String longerString = prefixString + u8000; // can also use uffff
here, same result
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // both are in
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
longerString should be here, but isn't
The class is honestly pretty complex and I wasn't able to completely debug why
this is behaving badly, but I believe it is because of how you are handling the
"bitIndex" and "bitLength".
For example, in the method AbstractPatriciaTrie .subtree(), my comments in bold
(this definitely isn't the only place with weird handling of bitIndex and
lengthInBits):
TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int
lengthInBits) {
TrieEntry<K, V> current = root.left;
TrieEntry<K, V> path = root;
while(true) {
// If our bit index has only increased by 1, then our bitIndex will never get
to be greater than lengthInBits, we could still start with the prefix, yet we
have no way to break out of here with current being set to the longer string.
if (current.bitIndex <= path.bitIndex || lengthInBits <
current.bitIndex) {
break;
}
path = current;
if (!isBitSet(prefix, offsetInBits + current.bitIndex, offsetInBits
+ lengthInBits)) {
current = current.left;
} else {
current = current.right;
}
}
...
Can you also make AbstractPatriciaTrie public, and your other package level
methods into protected level, that way I don't have to copy the entire class
and subclasse's code out into another class just to extend it?
thank you,
Chris Duncan (github user: VEQRYN)
was:
I have a bug report for you. It has to do with how you handle bits when they
are very close to each other. For example, some of your methods seem to think
that if the only difference between a prefix and a longer string, is a single
additional bit, then they are actually the same data. Or if the only
difference is some number of zero bits, then it also thinks they are the same
data. There are also MANY situations where the prefixMap does not return all
the strings that start with the prefix.
Take the following code for example, which shows the bugs happening in a very
simple context.
I create two strings. The first string is always a prefix of the second
string. Here are the characters I am using:
final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
final char u8000 = Character.toChars(32768)[0]; // U+8000 (1000000000000000)
final char uffff = Character.toChars(65535)[0]; // U+FFFF (1111111111111111)
final char char_b = 'b'; // 1100010
final char char_c = 'c'; // 1100011
final PatriciaTrie<String> trie = new PatriciaTrie<>();
And here is a quick example of the trie working correctly, showing how it
should work:
final String prefixString = "" + char_b;
final String longerString = prefixString + char_c;
System.out.println(trie.prefixMap(prefixString)); // {b=prefixString,
bc=longerString} // correct!
In the first example, I show that a character who's bits are all zeros is
always mishandled:
final String prefixString = "" + char_b;
final String longerString = prefixString + u0000;
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // only longerString shown... prefixString has
disappeared
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {b =longerString} //
prefixString should be here, but isn't
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in again, but longerString has
now disappeared
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} //
longerString should be here, but isn't
Next, I show that if the longer string is only 1 bit longer (ignoring zeros)
than the prefix string, then the PatriciaTree fails to include it in the prefix
map.
Here the string would look like: 0000000001100011 1000000000000000
final String prefixString = "" + char_c; // you can use any character for
the prefix, same results
final String longerString = prefixString + u8000;
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // both are in
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
longerString should be here, but isn't
And again, except flipping it so that the prefix ends with a 1, and the longer
string starts with 1's:
final String prefixString = "" + u0003;
final String longerString = prefixString + u8000; // can also use uffff
here, same result
System.out.println(prefixString);
System.out.println(prefixString.length()); // 1
System.out.println(longerString);
System.out.println(longerString.length()); // 2
System.out.println(longerString.startsWith(prefixString)); // true
trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in
trie.put(longerString, "longerString");
System.out.println(trie); // both are in
System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
should be 2
System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
longerString should be here, but isn't
The class is honestly pretty complex and I wasn't able to completely debug why
this is behaving badly, but I believe it is because of how you are handling the
"bitIndex" and "bitLength".
For example, in the method AbstractPatriciaTrie .subtree(), my comments in bold
(this definitely isn't the only place with weird handling of bitIndex and
lengthInBits):
TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int
lengthInBits) {
TrieEntry<K, V> current = root.left;
TrieEntry<K, V> path = root;
while(true) {
// If our bit index has only increased by 1, then our bitIndex will never get
to be greater than lengthInBits, we could still start with the prefix, yet we
have no way to break out of here with current being set to the longer string.
if (current.bitIndex <= path.bitIndex || lengthInBits <
current.bitIndex) {
break;
}
path = current;
if (!isBitSet(prefix, offsetInBits + current.bitIndex, offsetInBits
+ lengthInBits)) {
current = current.left;
} else {
current = current.right;
}
}
...
Can you also make AbstractPatriciaTrie public, and your other package level
methods into protected level, that way I don't have to copy the entire class
and subclasse's code out into another class just to extend it?
thank you,
Chris Duncan (github user: VEQRYN)
> PatriciaTrie bugs when only a few bits change
> ---------------------------------------------
>
> Key: COLLECTIONS-577
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-577
> Project: Commons Collections
> Issue Type: Bug
> Components: Map
> Affects Versions: 4.0
> Reporter: Chris Duncan
> Priority: Critical
>
> I have a bug report for you, for the class AbstractPatriciaTrie. It has to
> do with how you handle bits when they are very close to each other. For
> example, some of your methods seem to think that if the only difference
> between a prefix and a longer string, is a single additional bit, then they
> are actually the same data. Or if the only difference is some number of zero
> bits, then it also thinks they are the same data. There are also MANY
> situations where the prefixMap does not return all the strings that start
> with the prefix.
> Take the following code for example, which shows the bugs happening in a very
> simple context.
> I create two strings. The first string is always a prefix of the second
> string. Here are the characters I am using:
> final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
> final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
> final char u8000 = Character.toChars(32768)[0]; // U+8000
> (1000000000000000)
> final char uffff = Character.toChars(65535)[0]; // U+FFFF
> (1111111111111111)
> final char char_b = 'b'; // 1100010
> final char char_c = 'c'; // 1100011
> final PatriciaTrie<String> trie = new PatriciaTrie<>();
> And here is a quick example of the trie working correctly, showing how it
> should work:
> final String prefixString = "" + char_b;
> final String longerString = prefixString + char_c;
> System.out.println(trie.prefixMap(prefixString)); // {b=prefixString,
> bc=longerString} // correct!
> In the first example, I show that a character who's bits are all zeros is
> always mishandled:
> final String prefixString = "" + char_b;
> final String longerString = prefixString + u0000;
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // only longerString shown... prefixString has
> disappeared
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {b =longerString} //
> prefixString should be here, but isn't
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in again, but longerString
> has now disappeared
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} //
> longerString should be here, but isn't
> Next, I show that if the longer string is only 1 bit longer (ignoring zeros)
> than the prefix string, then the PatriciaTree fails to include it in the
> prefix map.
> Here the string would look like: 0000000001100011 1000000000000000
> final String prefixString = "" + char_c; // you can use any character
> for the prefix, same results
> final String longerString = prefixString + u8000;
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // both are in
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
> longerString should be here, but isn't
> And again, except flipping it so that the prefix ends with a 1, and the
> longer string starts with 1's:
> final String prefixString = "" + u0003;
> final String longerString = prefixString + u8000; // can also use uffff
> here, same result
> System.out.println(prefixString);
> System.out.println(prefixString.length()); // 1
> System.out.println(longerString);
> System.out.println(longerString.length()); // 2
> System.out.println(longerString.startsWith(prefixString)); // true
> trie.put(prefixString, "prefixString");
> System.out.println(trie); // prefixString is in
> trie.put(longerString, "longerString");
> System.out.println(trie); // both are in
> System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but
> should be 2
> System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} //
> longerString should be here, but isn't
> The class is honestly pretty complex and I wasn't able to completely debug
> why this is behaving badly, but I believe it is because of how you are
> handling the "bitIndex" and "bitLength".
> For example, in the method AbstractPatriciaTrie .subtree(), my comments in
> bold (this definitely isn't the only place with weird handling of bitIndex
> and lengthInBits):
> TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int
> lengthInBits) {
> TrieEntry<K, V> current = root.left;
> TrieEntry<K, V> path = root;
> while(true) {
> // If our bit index has only increased by 1, then our bitIndex will never get
> to be greater than lengthInBits, we could still start with the prefix, yet we
> have no way to break out of here with current being set to the longer string.
> if (current.bitIndex <= path.bitIndex || lengthInBits <
> current.bitIndex) {
> break;
> }
> path = current;
> if (!isBitSet(prefix, offsetInBits + current.bitIndex,
> offsetInBits + lengthInBits)) {
> current = current.left;
> } else {
> current = current.right;
> }
> }
> ...
> Can you also make AbstractPatriciaTrie public, and your other package level
> methods into protected level, that way I don't have to copy the entire class
> and subclasse's code out into another class just to extend it?
> thank you,
> Chris Duncan (github user: VEQRYN)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)