Agreed with what David wrote. The dropping of 'key' status in projections is intentional. It's the same as you get in MySQL if you have something like:
CREATE TABLE t ( x integer primary key, y integer ); CREATE TABLE t2 AS SELECT * FROM t; the 'select' results don't carry any notion of which columns were originally keys. In Kudu it smells a little funny today because we have two restrictions that MySQL doesn't have: 1) our keys must be listed first in the schema 2) all tables must have primary keys However, we're in the process of lifting the first restriction (hoping to finish it in the coming weeks), and I wouldn't be surprised if at some point we lifted the second as well (some tables work just fine as 'bags' with no need for PK constraints or organization). -Todd On Fri, Sep 9, 2016 at 9:01 AM, David Alves <[email protected]> wrote: > Oh I see what you mean. > It's not that the keys are getting dropped, its that they're not marked as > keys. > This arguably makes sense on a projection: for instance you might want the > keys returns in the end of the projection, while table schemas (at least > for now) require that they are present at the beginning of the projection. > If you really want to create a new table based on an existing one, you > could get the schema from KuduTable. That one should be complete. > > -david > > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell <[email protected] > > > wrote: > > > Right, what i'm saying is, if i do include the key in my projection, the > > schema does not maintain it as a key. The issue isnt so much that i cant > > apply predicates to the key column, its that if i wanted to create a > > projection and then want to use that projection to create a table based > on > > that projection, i'd have to rebuild the schema (i.e., the schema > returned > > is effectively useless for creating new tables). This pattern of > creating > > tables from projections is pretty common in dataframe like libraries in > > python. > > > > gist of offending code with comment on issue: > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7 > > > > > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <[email protected]> > wrote: > > > > > Wait, If you _do_ set a projection on the scanner that does not include > > the > > > keys, then they won't be returned (and won't appear on the projection's > > > schema). > > > Note that this does not mean that you can't set predicates on the key, > > it's > > > just that they'll be evaluated server side, but the key won't actually > be > > > returned. > > > Maybe I'm misunderstanding what you're saying? > > > Care to post a gist with the offending code? > > > > > > -david > > > > > > > > > > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell < > > [email protected] > > > > > > > wrote: > > > > > > > Hey David, > > > > > > > > Yep, i'm sure, taking a look at the scan_configuration class, the > issue > > > > seems to be here: > > > > > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const > vector<int>& > > > > col_indexes) { > > > > .... > > > > RETURN_NOT_OK*(s->Reset(cols, 0));* > > > > .... > > > > > > > > In the SetProjectedColumnIndexes method (which is also used by > > > > SetProjectedColumnNames), we're setting the schema without the index. > > > > > > > > There are probably a couple of ways to address this: > > > > > > > > 1. Check if all key columns are in the projection, and if so, > > maintain > > > > the key. > > > > 2. Provide an optional parameter to be able to set the key to > users > > > > preference for the new projection. This would be beneficial for > > cases > > > > where > > > > the user may want to create a new table based on their projection. > > > > > > > > Thoughts? > > > > > > > > Jordan > > > > > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <[email protected]> > > > wrote: > > > > > > > > > Hi Jordan > > > > > > > > > > KuduScanner::GetProjectionSchema returns the schema of the > > projection > > > > > that was previously set on the scanner. If you don't a projection > it > > > > should > > > > > indeed return all the columns. > > > > > Are you sure you didn't set a projection (with > > > SetProjectedColumnNames > > > > > or SetProjectedColumnIndexes) that excluded the key? > > > > > > > > > > Best > > > > > David > > > > > > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell < > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > Hey folks, > > > > > > > > > > > > i was doing some work on KUDU-854 > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when > testing > > > the > > > > > > KuduScanner::GetProjectionSchema method call, found that the key > > was > > > > > being > > > > > > dropped, which makes this much more challenging to test. Any > ideas > > if > > > > it > > > > > is > > > > > > intended to drop the key information in a scanner projection? I > > would > > > > > > imagine this could prevent functionality like creating new tables > > > from > > > > a > > > > > > projection. > > > > > > > > > > > > Thanks, > > > > > > Jordan > > > > > > > > > > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
