Github user selvaganesang commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/229#discussion_r48040857
--- Diff: core/sql/src/main/java/org/trafodion/sql/HBaseClient.java ---
@@ -1088,36 +1139,65 @@ public boolean estimateRowCount(String tblName, int
partialRowSize,
//printQualifiers(reader, 100);
if (ROWS_TO_SAMPLE > 0 &&
totalEntries == reader.getEntries()) { // first file only
- // Trafodion column qualifiers are ordinal numbers, which
- // makes it easy to count missing (null) values. We also count
- // the non-Put KVs (typically delete-row markers) to estimate
- // their frequency in the full file set.
+
+ // Trafodion column qualifiers are ordinal numbers, but are
represented
+ // as varying length unsigned little-endian integers in
lexicographical
+ // order. So, for example, in a table with 260 columns, the
column
+ // qualifiers (if present) will be read in this order:
+ // 1 (x'01'), 257 (x'0101'), 2 (x'02'), 258 (x'0201'), 3
(x'03'),
+ // 259 (x'0301'), 4 (x'04'), 260 (x'0401'), 5 (x'05'), 6
(x'06'),
+ // 7 (x'07'), ...
+ // We have crossed the boundary to the next row if and only if
the
+ // next qualifier read is less than or equal to the previous,
+ // compared unsigned, lexicographically.
+
--- End diff --
I wonder if it is possible to estimate based on the getEntries(), the
number of columns in the table and the number of default value/nul columns. We
also have avgKeyLen, avgValueLen and fileSize that might aid in row estimation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---