[
https://issues.apache.org/jira/browse/KUDU-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adar Dembo resolved KUDU-2980.
------------------------------
Fix Version/s: 1.11.0
Resolution: Fixed
Fixed in commit 08db97c59 (branch-1.11.x) and e23e52a1b (master).
> Fault tolerant and diff scans fail if projection contains mis-ordered primary
> key columns
> -----------------------------------------------------------------------------------------
>
> Key: KUDU-2980
> URL: https://issues.apache.org/jira/browse/KUDU-2980
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 1.10.0, 1.11.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Major
> Fix For: 1.11.0
>
>
> A fault tolerant (FT) scan needs the entirety of the primary key in its
> projection in order to work properly. Prior to 1.10.0, that was because:
> # FT scans sorted their results in primary key order (note: within a tablet
> only; this sort is not global). These scans used the MergeIterator to achieve
> this sorting by comparing rows via their primary keys.
> # Every FT scan RPC response included a "last primary key" which, in the
> event of failure, allowed the scan to be resumed from a particular key on
> another tserver.
> Two important caveats:
> # The primary key columns did not need to be part of the response sent to the
> client. They only needed to be part of the projection server-side in order to
> satisfy the above two requirements, then stripped out of the results before
> serialization. There was code in the tserver new scan path to add missing key
> columns to the projection of an FT scan so that clients needn't concern
> themselves with this.
> # The order of the primary key columns in the projection didn't matter.
> Although non-obvious, this was because the same order was used in all
> MergeIterator comparisons and in all "last primary key" fields. Clients that
> relied on the "partial sort" behavior of an FT scan would no doubt have been
> surprised with the results, but the _fault tolerant_ aspect of the scan
> wasn't affected.
> 1.10.0 implicitly removed that last caveat by requiring the primary key
> columns of an FT scan to be in table schema order. That's because of the
> MergeIterator changes made in KUDU-2466: now the MergeIterator also compares
> _rowset bounds_ to primary keys, and rowset bounds are always stored in table
> schema order. This means that since 1.10.0, any FT scan whose server-side
> projection had mis-ordered primary key columns would fail. If you were lucky,
> the error would surface at scan start time and included either the text "key
> too short" or "Missing separator after composite key string component".
> What kind of FT scan could cause this?
> * A scan whose projection included at least two primary key columns in a
> different order than how they were ordered in the table's schema.
> * A scan whose projection didn't include all primary key columns, but whose
> predicates included one or more of the primary key columns missing from the
> projection. Predicates are accumulated in a hash map (keyed by column name)
> before being serialized to the wire, so when the tserver adds missing key
> columns from predicates into the scan projection, they're effectively in
> random order.
> Diff scans, by virtue of also being FT scans, are also affected. However, the
> BDR Spark application is unaffected because it always projects the entire
> table schema verbatim.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)