Adar Dembo created KUDU-2980:
--------------------------------
Summary: Fault tolerant and diff scans fail if projection contains
mis-ordered primary key columns
Key: KUDU-2980
URL: https://issues.apache.org/jira/browse/KUDU-2980
Project: Kudu
Issue Type: Bug
Components: tserver
Affects Versions: 1.10.0, 1.11.0
Reporter: Adar Dembo
Assignee: Adar Dembo
A fault tolerant (FT) scan needs the entirety of the primary key in its
projection in order to work properly. Prior to 1.10.0, that was because:
# FT scans sorted their results in primary key order (note: within a tablet
only; this sort is not global). These scans used the MergeIterator to achieve
this sorting by comparing rows via their primary keys.
# Every FT scan RPC response included a "last primary key" which, in the event
of failure, allowed the scan to be resumed from a particular key on another
tserver.
Two important caveats:
# The primary key columns did not need to be part of the response sent to the
client. They only needed to be part of the projection server-side in order to
satisfy the above two requirements, then stripped out of the results before
serialization. There was code in the tserver new scan path to add missing key
columns to the projection of an FT scan so that clients needn't concern
themselves with this.
# The order of the primary key columns in the projection didn't matter.
Although non-obvious, this was because the same order was used in all
MergeIterator comparisons and in all "last primary key" fields. Clients that
relied on the "partial sort" behavior of an FT scan would no doubt have been
surprised with the results, but the _fault tolerant_ aspect of the scan wasn't
affected.
1.10.0 implicitly removed that last caveat by requiring the primary key columns
of an FT scan to be in table schema order. That's because of the MergeIterator
changes made in KUDU-2466: now the MergeIterator also compares _rowset bounds_
to primary keys, and rowset bounds are always stored in table schema order.
This means that since 1.10.0, any FT scan whose server-side projection had
mis-ordered primary key columns would fail. If you were lucky, the error would
surface at scan start time and included either the text "key too short" or
"Missing separator after composite key string component".
What kind of FT scan could cause this?
* A scan whose projection included at least two primary key columns in a
different order than how they were ordered in the table's schema.
* A scan whose projection didn't include all primary key columns, but whose
predicates included one or more of the primary key columns missing from the
projection. Predicates are accumulated in a hash map (keyed by column name)
before being serialized to the wire, so when the tserver adds missing key
columns from predicates into the scan projection, they're effectively in random
order.
Diff scans, by virtue of also being FT scans, are also affected. However, the
BDR Spark application is unaffected because it always projects the entire table
schema verbatim.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)