[ 
https://issues.apache.org/jira/browse/KUDU-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo resolved KUDU-2980.
------------------------------
    Fix Version/s: 1.11.0
       Resolution: Fixed

Fixed in commit 08db97c59 (branch-1.11.x) and e23e52a1b (master).

> Fault tolerant and diff scans fail if projection contains mis-ordered primary 
> key columns
> -----------------------------------------------------------------------------------------
>
>                 Key: KUDU-2980
>                 URL: https://issues.apache.org/jira/browse/KUDU-2980
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Major
>             Fix For: 1.11.0
>
>
> A fault tolerant (FT) scan needs the entirety of the primary key in its 
> projection in order to work properly. Prior to 1.10.0, that was because:
> # FT scans sorted their results in primary key order (note: within a tablet 
> only; this sort is not global). These scans used the MergeIterator to achieve 
> this sorting by comparing rows via their primary keys. 
> # Every FT scan RPC response included a "last primary key" which, in the 
> event of failure, allowed the scan to be resumed from a particular key on 
> another tserver.
> Two important caveats:
> # The primary key columns did not need to be part of the response sent to the 
> client. They only needed to be part of the projection server-side in order to 
> satisfy the above two requirements, then stripped out of the results before 
> serialization. There was code in the tserver new scan path to add missing key 
> columns to the projection of an FT scan so that clients needn't concern 
> themselves with this.
> # The order of the primary key columns in the projection didn't matter. 
> Although non-obvious, this was because the same order was used in all 
> MergeIterator comparisons and in all "last primary key" fields. Clients that 
> relied on the "partial sort" behavior of an FT scan would no doubt have been 
> surprised with the results, but the _fault tolerant_ aspect of the scan 
> wasn't affected.
> 1.10.0 implicitly removed that last caveat by requiring the primary key 
> columns of an FT scan to be in table schema order. That's because of the 
> MergeIterator changes made in KUDU-2466: now the MergeIterator also compares 
> _rowset bounds_ to primary keys, and rowset bounds are always stored in table 
> schema order. This means that since 1.10.0, any FT scan whose server-side 
> projection had mis-ordered primary key columns would fail. If you were lucky, 
> the error would surface at scan start time and included either the text "key 
> too short" or "Missing separator after composite key string component".
> What kind of FT scan could cause this?
> * A scan whose projection included at least two primary key columns in a 
> different order than how they were ordered in the table's schema.
> * A scan whose projection didn't include all primary key columns, but whose 
> predicates included one or more of the primary key columns missing from the 
> projection. Predicates are accumulated in a hash map (keyed by column name) 
> before being serialized to the wire, so when the tserver adds missing key 
> columns from predicates into the scan projection, they're effectively in 
> random order.
> Diff scans, by virtue of also being FT scans, are also affected. However, the 
> BDR Spark application is unaffected because it always projects the entire 
> table schema verbatim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to