Github user sureshsubbiah commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/229#discussion_r47987524
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HBaseClient.java ---
    @@ -1088,36 +1139,65 @@ public boolean estimateRowCount(String tblName, int 
partialRowSize,
               //printQualifiers(reader, 100);
               if (ROWS_TO_SAMPLE > 0 &&
                   totalEntries == reader.getEntries()) {  // first file only
    -            // Trafodion column qualifiers are ordinal numbers, which
    -            // makes it easy to count missing (null) values. We also count
    -            // the non-Put KVs (typically delete-row markers) to estimate
    -            // their frequency in the full file set.
    +
    +            // Trafodion column qualifiers are ordinal numbers, but are 
represented
    +            // as varying length unsigned little-endian integers in 
lexicographical
    +            // order. So, for example, in a table with 260 columns, the 
column
    +            // qualifiers (if present) will be read in this order: 
    +            // 1 (x'01'), 257 (x'0101'), 2 (x'02'), 258 (x'0201'), 3 
(x'03'),
    +            // 259 (x'0301'), 4 (x'04'), 260 (x'0401'), 5 (x'05'), 6 
(x'06'), 
    +            // 7 (x'07'), ...
    +            // We have crossed the boundary to the next row if and only if 
the
    +            // next qualifier read is less than or equal to the previous, 
    +            // compared unsigned, lexicographically.
    +
    --- End diff --
    
    Current code is good, though I am confused as to why we do not try 
something simpler like
    comparing the Key in consecutive KeyValue objects till it changes? There is 
this method on KeyValue that will return the key as a string 
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/KeyValue.html#getKeyString()
    Maybe we just compare strings then?
    Is the idea that keys can be longer strings and are more expensive to 
compare?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to