On Fri, Sep 9, 2016 at 2:26 PM, Jordan Birdsell <[email protected]> wrote:
> Threw in a helper function in the SchemaBuilder to at least make it a > little cleaner to construct a new schema with a key from the projection > schema > > ex: > builder = kudu.schema_builder() > for col in scanner.get_projection_schema(): > builder.copy_column(col) > builder.set_primary_keys(['key']) > That sounds pretty reasonable to me. if your use case is copying existing tables, though, you might also want to consider compression and encoding, which wouldn't be part of a projection, but would be part of the original table's schema. -Todd > > On Fri, Sep 9, 2016 at 3:10 PM Jordan Birdsell <[email protected]> > wrote: > > > I follow now, thanks > > > > On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <[email protected]> wrote: > > > >> Agreed with what David wrote. The dropping of 'key' status in > projections > >> is intentional. It's the same as you get in MySQL if you have something > >> like: > >> > >> CREATE TABLE t ( > >> x integer primary key, > >> y integer > >> ); > >> > >> CREATE TABLE t2 AS SELECT * FROM t; > >> > >> the 'select' results don't carry any notion of which columns were > >> originally keys. > >> > >> In Kudu it smells a little funny today because we have two restrictions > >> that MySQL doesn't have: > >> 1) our keys must be listed first in the schema > >> 2) all tables must have primary keys > >> > >> However, we're in the process of lifting the first restriction (hoping > to > >> finish it in the coming weeks), and I wouldn't be surprised if at some > >> point we lifted the second as well (some tables work just fine as 'bags' > >> with no need for PK constraints or organization). > >> > >> -Todd > >> > >> > >> On Fri, Sep 9, 2016 at 9:01 AM, David Alves <[email protected]> > >> wrote: > >> > >> > Oh I see what you mean. > >> > It's not that the keys are getting dropped, its that they're not > marked > >> as > >> > keys. > >> > This arguably makes sense on a projection: for instance you might want > >> the > >> > keys returns in the end of the projection, while table schemas (at > least > >> > for now) require that they are present at the beginning of the > >> projection. > >> > If you really want to create a new table based on an existing one, you > >> > could get the schema from KuduTable. That one should be complete. > >> > > >> > -david > >> > > >> > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell < > >> [email protected] > >> > > > >> > wrote: > >> > > >> > > Right, what i'm saying is, if i do include the key in my projection, > >> the > >> > > schema does not maintain it as a key. The issue isnt so much that i > >> cant > >> > > apply predicates to the key column, its that if i wanted to create a > >> > > projection and then want to use that projection to create a table > >> based > >> > on > >> > > that projection, i'd have to rebuild the schema (i.e., the schema > >> > returned > >> > > is effectively useless for creating new tables). This pattern of > >> > creating > >> > > tables from projections is pretty common in dataframe like libraries > >> in > >> > > python. > >> > > > >> > > gist of offending code with comment on issue: > >> > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7 > >> > > > >> > > > >> > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <[email protected]> > >> > wrote: > >> > > > >> > > > Wait, If you _do_ set a projection on the scanner that does not > >> include > >> > > the > >> > > > keys, then they won't be returned (and won't appear on the > >> projection's > >> > > > schema). > >> > > > Note that this does not mean that you can't set predicates on the > >> key, > >> > > it's > >> > > > just that they'll be evaluated server side, but the key won't > >> actually > >> > be > >> > > > returned. > >> > > > Maybe I'm misunderstanding what you're saying? > >> > > > Care to post a gist with the offending code? > >> > > > > >> > > > -david > >> > > > > >> > > > > >> > > > > >> > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell < > >> > > [email protected] > >> > > > > > >> > > > wrote: > >> > > > > >> > > > > Hey David, > >> > > > > > >> > > > > Yep, i'm sure, taking a look at the scan_configuration class, > the > >> > issue > >> > > > > seems to be here: > >> > > > > > >> > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const > >> > vector<int>& > >> > > > > col_indexes) { > >> > > > > .... > >> > > > > RETURN_NOT_OK*(s->Reset(cols, 0));* > >> > > > > .... > >> > > > > > >> > > > > In the SetProjectedColumnIndexes method (which is also used by > >> > > > > SetProjectedColumnNames), we're setting the schema without the > >> index. > >> > > > > > >> > > > > There are probably a couple of ways to address this: > >> > > > > > >> > > > > 1. Check if all key columns are in the projection, and if so, > >> > > maintain > >> > > > > the key. > >> > > > > 2. Provide an optional parameter to be able to set the key to > >> > users > >> > > > > preference for the new projection. This would be beneficial > for > >> > > cases > >> > > > > where > >> > > > > the user may want to create a new table based on their > >> projection. > >> > > > > > >> > > > > Thoughts? > >> > > > > > >> > > > > Jordan > >> > > > > > >> > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves < > >> [email protected]> > >> > > > wrote: > >> > > > > > >> > > > > > Hi Jordan > >> > > > > > > >> > > > > > KuduScanner::GetProjectionSchema returns the schema of the > >> > > projection > >> > > > > > that was previously set on the scanner. If you don't a > >> projection > >> > it > >> > > > > should > >> > > > > > indeed return all the columns. > >> > > > > > Are you sure you didn't set a projection (with > >> > > > SetProjectedColumnNames > >> > > > > > or SetProjectedColumnIndexes) that excluded the key? > >> > > > > > > >> > > > > > Best > >> > > > > > David > >> > > > > > > >> > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell < > >> > > > > [email protected] > >> > > > > > > > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Hey folks, > >> > > > > > > > >> > > > > > > i was doing some work on KUDU-854 > >> > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when > >> > testing > >> > > > the > >> > > > > > > KuduScanner::GetProjectionSchema method call, found that > the > >> key > >> > > was > >> > > > > > being > >> > > > > > > dropped, which makes this much more challenging to test. Any > >> > ideas > >> > > if > >> > > > > it > >> > > > > > is > >> > > > > > > intended to drop the key information in a scanner > projection? > >> I > >> > > would > >> > > > > > > imagine this could prevent functionality like creating new > >> tables > >> > > > from > >> > > > > a > >> > > > > > > projection. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > Jordan > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > > -- Todd Lipcon Software Engineer, Cloudera
