Github user JamesRTaylor commented on a diff in the pull request:
https://github.com/apache/incubator-phoenix/pull/8#discussion_r10141720
--- Diff:
phoenix-core/src/test/java/org/apache/phoenix/end2end/ArrayTest.java ---
@@ -393,6 +393,73 @@ public void testSelectWithArrayWithColumnRef() throws
Exception {
}
--- End diff --
Yes, the serializeNulls method isn't correct (see below). High level goal
is to ensure that the arrays sort based on their byte array. The particular
issue here is how to encode nulls that occur in the middle such that this
remains true. Lastly, the only triple 0 byte should occur at the end (as this
is our terminal signal that ensures we'll never use the header information at
the end for sorting, unless, of course, the arrays are equal in which case the
header info is the same so this is fine). It's three 0 bytes because of the
terminator byte of the last element plus the two 0 bytes you add at the end.
Remember, trailing null bytes are never stored, so the last element has to be
non null.
So, one issue was that the nullCount byte needs to be inverted. Another new
issue is that we need to mod with 255, not 256. The reason is that we need the
nullCount byte to not be zero (and thus we need to offset it by one). If we let
it be zero, then you'd have the possibility of three null bytes in a row which
potentially breaks our comparability. A third issue, is that you need to write
a 01 byte for every multiple of 255 nulls there are which wasn't happening
before. So to do this, do the following:
public static int serializeNulls(DataOutputStream oStream, int nulls)
throws IOException {
if (nulls > 0) {
oStream.write(QueryConstants.SEPARATOR_BYTE);
int nMultiplesOver255 = nulls / 255;
while (nMultiplesOver255-- > 0) {
// Don't write a zero byte, as we need to ensure that the
only triple zero
// byte occurs at the end of the array (i.e. the terminator
byte for the
// element plus the double zero byte at the end of the
array).
oStream.write((byte)1);
}
int nRemainingNulls = nulls % 255; // From 0 to 254
// Write a byte for the remaining null elements
if (nRemainingNulls > 0) {
// Invert so that more remaining nulls becomes smaller than
less remaining nulls
// The reason is that the less remaining null array has an
element after which would
// line up with a not null element in an array with fewer
null elements.
byte nNullByte = SortOrder.invert((byte)nRemainingNulls);
oStream.write(nNullByte); // Single byte for repeating nulls
}
}
return 0;
}
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---