Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/incubator-phoenix/pull/8#discussion_r10141720
  
    --- Diff: 
phoenix-core/src/test/java/org/apache/phoenix/end2end/ArrayTest.java ---
    @@ -393,6 +393,73 @@ public void testSelectWithArrayWithColumnRef() throws 
Exception {
         }
         
    --- End diff --
    
    Yes, the serializeNulls method isn't correct (see below). High level goal 
is to ensure that the arrays sort based on their byte array. The particular 
issue here is how to encode nulls that occur in the middle such that this 
remains true. Lastly, the only triple 0 byte should occur at the end (as this 
is our terminal signal that ensures we'll never use the header information at 
the end for sorting, unless, of course, the arrays are equal in which case the 
header info is the same so this is fine). It's three 0 bytes because of the 
terminator byte of the last element plus the two 0 bytes you add at the end. 
Remember, trailing null bytes are never stored, so the last element has to be 
non null.
    
    So, one issue was that the nullCount byte needs to be inverted. Another new 
issue is that we need to mod with 255, not 256. The reason is that we need the 
nullCount byte to not be zero (and thus we need to offset it by one). If we let 
it be zero, then you'd have the possibility of three null bytes in a row which 
potentially breaks our comparability. A third issue, is that you need to write 
a 01 byte for every multiple of 255 nulls there are which wasn't happening 
before. So to do this, do the following:
    
        public static int serializeNulls(DataOutputStream oStream, int nulls) 
throws IOException {
            if (nulls > 0) {
                oStream.write(QueryConstants.SEPARATOR_BYTE);
                int nMultiplesOver255 = nulls / 255;
                while (nMultiplesOver255-- > 0) {
                    // Don't write a zero byte, as we need to ensure that the 
only triple zero
                    // byte occurs at the end of the array (i.e. the terminator 
byte for the
                    // element plus the double zero byte at the end of the 
array).
                    oStream.write((byte)1); 
                }
                int nRemainingNulls = nulls % 255; // From 0 to 254
                // Write a byte for the remaining null elements
                if (nRemainingNulls > 0) {
                    // Invert so that more remaining nulls becomes smaller than 
less remaining nulls
                    // The reason is that the less remaining null array has an 
element after which would
                    // line up with a not null element in an array with fewer 
null elements.
                    byte nNullByte = SortOrder.invert((byte)nRemainingNulls);
                    oStream.write(nNullByte); // Single byte for repeating nulls
                }
            }
            return 0;
        }
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to