On Fri, Sep 9, 2016 at 2:26 PM, Jordan Birdsell <[email protected]>
wrote:

> Threw in a helper function in the SchemaBuilder to at least make it a
> little cleaner to construct a new schema with a key from the projection
> schema
>
> ex:
>         builder = kudu.schema_builder()
>         for col in scanner.get_projection_schema():
>             builder.copy_column(col)
>         builder.set_primary_keys(['key'])
>

That sounds pretty reasonable to me.

if your use case is copying existing tables, though, you might also want to
consider compression and encoding, which wouldn't be part of a projection,
but would be part of the original table's schema.

-Todd


>
> On Fri, Sep 9, 2016 at 3:10 PM Jordan Birdsell <[email protected]>
> wrote:
>
> > I follow now, thanks
> >
> > On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <[email protected]> wrote:
> >
> >> Agreed with what David wrote. The dropping of 'key' status in
> projections
> >> is intentional. It's the same as you get in MySQL if you have something
> >> like:
> >>
> >> CREATE TABLE t (
> >>   x integer primary key,
> >>   y integer
> >> );
> >>
> >> CREATE TABLE t2 AS SELECT * FROM t;
> >>
> >> the 'select' results don't carry any notion of which columns were
> >> originally keys.
> >>
> >> In Kudu it smells a little funny today because we have two restrictions
> >> that MySQL doesn't have:
> >> 1) our keys must be listed first in the schema
> >> 2) all tables must have primary keys
> >>
> >> However, we're in the process of lifting the first restriction (hoping
> to
> >> finish it in the coming weeks), and I wouldn't be surprised if at some
> >> point we lifted the second as well (some tables work just fine as 'bags'
> >> with no need for PK constraints or organization).
> >>
> >> -Todd
> >>
> >>
> >> On Fri, Sep 9, 2016 at 9:01 AM, David Alves <[email protected]>
> >> wrote:
> >>
> >> > Oh I see what you mean.
> >> > It's not that the keys are getting dropped, its that they're not
> marked
> >> as
> >> > keys.
> >> > This arguably makes sense on a projection: for instance you might want
> >> the
> >> > keys returns in the end of the projection, while table schemas (at
> least
> >> > for now) require that they are present at the beginning of the
> >> projection.
> >> > If you really want to create a new table based on an existing one, you
> >> > could get the schema from KuduTable. That one should be complete.
> >> >
> >> > -david
> >> >
> >> > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell <
> >> [email protected]
> >> > >
> >> > wrote:
> >> >
> >> > > Right, what i'm saying is, if i do include the key in my projection,
> >> the
> >> > > schema does not maintain it as a key.  The issue isnt so much that i
> >> cant
> >> > > apply predicates to the key column, its that if i wanted to create a
> >> > > projection and then want to use that projection to create a table
> >> based
> >> > on
> >> > > that projection, i'd have to rebuild the schema (i.e., the schema
> >> > returned
> >> > > is effectively useless for creating new tables).  This pattern of
> >> > creating
> >> > > tables from projections is pretty common in dataframe like libraries
> >> in
> >> > > python.
> >> > >
> >> > > gist of offending code with comment on issue:
> >> > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7
> >> > >
> >> > >
> >> > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <[email protected]>
> >> > wrote:
> >> > >
> >> > > > Wait, If you _do_ set a projection on the scanner that does not
> >> include
> >> > > the
> >> > > > keys, then they won't be returned (and won't appear on the
> >> projection's
> >> > > > schema).
> >> > > > Note that this does not mean that you can't set predicates on the
> >> key,
> >> > > it's
> >> > > > just that they'll be evaluated server side, but the key won't
> >> actually
> >> > be
> >> > > > returned.
> >> > > > Maybe I'm misunderstanding what you're saying?
> >> > > > Care to post a gist with the offending code?
> >> > > >
> >> > > > -david
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell <
> >> > > [email protected]
> >> > > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hey David,
> >> > > > >
> >> > > > > Yep, i'm sure, taking a look at the scan_configuration class,
> the
> >> > issue
> >> > > > > seems to be here:
> >> > > > >
> >> > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const
> >> > vector<int>&
> >> > > > > col_indexes) {
> >> > > > > ....
> >> > > > >   RETURN_NOT_OK*(s->Reset(cols, 0));*
> >> > > > > ....
> >> > > > >
> >> > > > > In the SetProjectedColumnIndexes method (which is also used by
> >> > > > > SetProjectedColumnNames), we're setting the schema without the
> >> index.
> >> > > > >
> >> > > > > There are probably a couple of ways to address this:
> >> > > > >
> >> > > > >    1. Check if all key columns are in the projection, and if so,
> >> > > maintain
> >> > > > >    the key.
> >> > > > >    2. Provide an optional parameter to be able to set the key to
> >> > users
> >> > > > >    preference for the new projection. This would be beneficial
> for
> >> > > cases
> >> > > > > where
> >> > > > >    the user may want to create a new table based on their
> >> projection.
> >> > > > >
> >> > > > > Thoughts?
> >> > > > >
> >> > > > > Jordan
> >> > > > >
> >> > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <
> >> [email protected]>
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi Jordan
> >> > > > > >
> >> > > > > >   KuduScanner::GetProjectionSchema returns the schema of the
> >> > > projection
> >> > > > > > that was previously set on the scanner. If you don't a
> >> projection
> >> > it
> >> > > > > should
> >> > > > > > indeed return all the columns.
> >> > > > > >   Are you sure you didn't set a projection (with
> >> > > > SetProjectedColumnNames
> >> > > > > > or SetProjectedColumnIndexes) that excluded the key?
> >> > > > > >
> >> > > > > > Best
> >> > > > > > David
> >> > > > > >
> >> > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell <
> >> > > > > [email protected]
> >> > > > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hey folks,
> >> > > > > > >
> >> > > > > > > i was doing some work on KUDU-854
> >> > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when
> >> > testing
> >> > > > the
> >> > > > > > > KuduScanner::GetProjectionSchema method call, found that
> the
> >> key
> >> > > was
> >> > > > > > being
> >> > > > > > > dropped, which makes this much more challenging to test. Any
> >> > ideas
> >> > > if
> >> > > > > it
> >> > > > > > is
> >> > > > > > > intended to drop the key information in a scanner
> projection?
> >> I
> >> > > would
> >> > > > > > > imagine this could prevent functionality like creating new
> >> tables
> >> > > > from
> >> > > > > a
> >> > > > > > > projection.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Jordan
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to