Adar Dembo created KUDU-2980:
--------------------------------

             Summary: Fault tolerant and diff scans fail if projection contains 
mis-ordered primary key columns
                 Key: KUDU-2980
                 URL: https://issues.apache.org/jira/browse/KUDU-2980
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.10.0, 1.11.0
            Reporter: Adar Dembo
            Assignee: Adar Dembo


A fault tolerant (FT) scan needs the entirety of the primary key in its 
projection in order to work properly. Prior to 1.10.0, that was because:
# FT scans sorted their results in primary key order (note: within a tablet 
only; this sort is not global). These scans used the MergeIterator to achieve 
this sorting by comparing rows via their primary keys. 
# Every FT scan RPC response included a "last primary key" which, in the event 
of failure, allowed the scan to be resumed from a particular key on another 
tserver.

Two important caveats:
# The primary key columns did not need to be part of the response sent to the 
client. They only needed to be part of the projection server-side in order to 
satisfy the above two requirements, then stripped out of the results before 
serialization. There was code in the tserver new scan path to add missing key 
columns to the projection of an FT scan so that clients needn't concern 
themselves with this.
# The order of the primary key columns in the projection didn't matter. 
Although non-obvious, this was because the same order was used in all 
MergeIterator comparisons and in all "last primary key" fields. Clients that 
relied on the "partial sort" behavior of an FT scan would no doubt have been 
surprised with the results, but the _fault tolerant_ aspect of the scan wasn't 
affected.

1.10.0 implicitly removed that last caveat by requiring the primary key columns 
of an FT scan to be in table schema order. That's because of the MergeIterator 
changes made in KUDU-2466: now the MergeIterator also compares _rowset bounds_ 
to primary keys, and rowset bounds are always stored in table schema order. 
This means that since 1.10.0, any FT scan whose server-side projection had 
mis-ordered primary key columns would fail. If you were lucky, the error would 
surface at scan start time and included either the text "key too short" or 
"Missing separator after composite key string component".

What kind of FT scan could cause this?
* A scan whose projection included at least two primary key columns in a 
different order than how they were ordered in the table's schema.
* A scan whose projection didn't include all primary key columns, but whose 
predicates included one or more of the primary key columns missing from the 
projection. Predicates are accumulated in a hash map (keyed by column name) 
before being serialized to the wire, so when the tserver adds missing key 
columns from predicates into the scan projection, they're effectively in random 
order.

Diff scans, by virtue of also being FT scans, are also affected. However, the 
BDR Spark application is unaffected because it always projects the entire table 
schema verbatim.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to