Agreed, I considered this, however I decided to consider it in the same
vein as the key issue. Especially considering that those attributes
currently aren't accessible from the KuduColumnSchema class. Im thinking
that I'll just note it in the method comments that compression and encoding
will need to be reset if desired.

On Fri, Sep 9, 2016, 7:03 PM Todd Lipcon <[email protected]> wrote:

> On Fri, Sep 9, 2016 at 2:26 PM, Jordan Birdsell <[email protected]
> >
> wrote:
>
> > Threw in a helper function in the SchemaBuilder to at least make it a
> > little cleaner to construct a new schema with a key from the projection
> > schema
> >
> > ex:
> >         builder = kudu.schema_builder()
> >         for col in scanner.get_projection_schema():
> >             builder.copy_column(col)
> >         builder.set_primary_keys(['key'])
> >
>
> That sounds pretty reasonable to me.
>
> if your use case is copying existing tables, though, you might also want to
> consider compression and encoding, which wouldn't be part of a projection,
> but would be part of the original table's schema.
>
> -Todd
>
>
> >
> > On Fri, Sep 9, 2016 at 3:10 PM Jordan Birdsell <
> [email protected]>
> > wrote:
> >
> > > I follow now, thanks
> > >
> > > On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <[email protected]> wrote:
> > >
> > >> Agreed with what David wrote. The dropping of 'key' status in
> > projections
> > >> is intentional. It's the same as you get in MySQL if you have
> something
> > >> like:
> > >>
> > >> CREATE TABLE t (
> > >>   x integer primary key,
> > >>   y integer
> > >> );
> > >>
> > >> CREATE TABLE t2 AS SELECT * FROM t;
> > >>
> > >> the 'select' results don't carry any notion of which columns were
> > >> originally keys.
> > >>
> > >> In Kudu it smells a little funny today because we have two
> restrictions
> > >> that MySQL doesn't have:
> > >> 1) our keys must be listed first in the schema
> > >> 2) all tables must have primary keys
> > >>
> > >> However, we're in the process of lifting the first restriction (hoping
> > to
> > >> finish it in the coming weeks), and I wouldn't be surprised if at some
> > >> point we lifted the second as well (some tables work just fine as
> 'bags'
> > >> with no need for PK constraints or organization).
> > >>
> > >> -Todd
> > >>
> > >>
> > >> On Fri, Sep 9, 2016 at 9:01 AM, David Alves <[email protected]>
> > >> wrote:
> > >>
> > >> > Oh I see what you mean.
> > >> > It's not that the keys are getting dropped, its that they're not
> > marked
> > >> as
> > >> > keys.
> > >> > This arguably makes sense on a projection: for instance you might
> want
> > >> the
> > >> > keys returns in the end of the projection, while table schemas (at
> > least
> > >> > for now) require that they are present at the beginning of the
> > >> projection.
> > >> > If you really want to create a new table based on an existing one,
> you
> > >> > could get the schema from KuduTable. That one should be complete.
> > >> >
> > >> > -david
> > >> >
> > >> > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell <
> > >> [email protected]
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Right, what i'm saying is, if i do include the key in my
> projection,
> > >> the
> > >> > > schema does not maintain it as a key.  The issue isnt so much
> that i
> > >> cant
> > >> > > apply predicates to the key column, its that if i wanted to
> create a
> > >> > > projection and then want to use that projection to create a table
> > >> based
> > >> > on
> > >> > > that projection, i'd have to rebuild the schema (i.e., the schema
> > >> > returned
> > >> > > is effectively useless for creating new tables).  This pattern of
> > >> > creating
> > >> > > tables from projections is pretty common in dataframe like
> libraries
> > >> in
> > >> > > python.
> > >> > >
> > >> > > gist of offending code with comment on issue:
> > >> > >
> https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7
> > >> > >
> > >> > >
> > >> > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <
> [email protected]>
> > >> > wrote:
> > >> > >
> > >> > > > Wait, If you _do_ set a projection on the scanner that does not
> > >> include
> > >> > > the
> > >> > > > keys, then they won't be returned (and won't appear on the
> > >> projection's
> > >> > > > schema).
> > >> > > > Note that this does not mean that you can't set predicates on
> the
> > >> key,
> > >> > > it's
> > >> > > > just that they'll be evaluated server side, but the key won't
> > >> actually
> > >> > be
> > >> > > > returned.
> > >> > > > Maybe I'm misunderstanding what you're saying?
> > >> > > > Care to post a gist with the offending code?
> > >> > > >
> > >> > > > -david
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell <
> > >> > > [email protected]
> > >> > > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hey David,
> > >> > > > >
> > >> > > > > Yep, i'm sure, taking a look at the scan_configuration class,
> > the
> > >> > issue
> > >> > > > > seems to be here:
> > >> > > > >
> > >> > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const
> > >> > vector<int>&
> > >> > > > > col_indexes) {
> > >> > > > > ....
> > >> > > > >   RETURN_NOT_OK*(s->Reset(cols, 0));*
> > >> > > > > ....
> > >> > > > >
> > >> > > > > In the SetProjectedColumnIndexes method (which is also used by
> > >> > > > > SetProjectedColumnNames), we're setting the schema without the
> > >> index.
> > >> > > > >
> > >> > > > > There are probably a couple of ways to address this:
> > >> > > > >
> > >> > > > >    1. Check if all key columns are in the projection, and if
> so,
> > >> > > maintain
> > >> > > > >    the key.
> > >> > > > >    2. Provide an optional parameter to be able to set the key
> to
> > >> > users
> > >> > > > >    preference for the new projection. This would be beneficial
> > for
> > >> > > cases
> > >> > > > > where
> > >> > > > >    the user may want to create a new table based on their
> > >> projection.
> > >> > > > >
> > >> > > > > Thoughts?
> > >> > > > >
> > >> > > > > Jordan
> > >> > > > >
> > >> > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Jordan
> > >> > > > > >
> > >> > > > > >   KuduScanner::GetProjectionSchema returns the schema of the
> > >> > > projection
> > >> > > > > > that was previously set on the scanner. If you don't a
> > >> projection
> > >> > it
> > >> > > > > should
> > >> > > > > > indeed return all the columns.
> > >> > > > > >   Are you sure you didn't set a projection (with
> > >> > > > SetProjectedColumnNames
> > >> > > > > > or SetProjectedColumnIndexes) that excluded the key?
> > >> > > > > >
> > >> > > > > > Best
> > >> > > > > > David
> > >> > > > > >
> > >> > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell <
> > >> > > > > [email protected]
> > >> > > > > > >
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hey folks,
> > >> > > > > > >
> > >> > > > > > > i was doing some work on KUDU-854
> > >> > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when
> > >> > testing
> > >> > > > the
> > >> > > > > > > KuduScanner::GetProjectionSchema method call, found that
> > the
> > >> key
> > >> > > was
> > >> > > > > > being
> > >> > > > > > > dropped, which makes this much more challenging to test.
> Any
> > >> > ideas
> > >> > > if
> > >> > > > > it
> > >> > > > > > is
> > >> > > > > > > intended to drop the key information in a scanner
> > projection?
> > >> I
> > >> > > would
> > >> > > > > > > imagine this could prevent functionality like creating new
> > >> tables
> > >> > > > from
> > >> > > > > a
> > >> > > > > > > projection.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Jordan
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to