Threw in a helper function in the SchemaBuilder to at least make it a
little cleaner to construct a new schema with a key from the projection
schema
ex:
builder = kudu.schema_builder()
for col in scanner.get_projection_schema():
builder.copy_column(col)
builder.set_primary_keys(['key'])
On Fri, Sep 9, 2016 at 3:10 PM Jordan Birdsell <[email protected]>
wrote:
> I follow now, thanks
>
> On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <[email protected]> wrote:
>
>> Agreed with what David wrote. The dropping of 'key' status in projections
>> is intentional. It's the same as you get in MySQL if you have something
>> like:
>>
>> CREATE TABLE t (
>> x integer primary key,
>> y integer
>> );
>>
>> CREATE TABLE t2 AS SELECT * FROM t;
>>
>> the 'select' results don't carry any notion of which columns were
>> originally keys.
>>
>> In Kudu it smells a little funny today because we have two restrictions
>> that MySQL doesn't have:
>> 1) our keys must be listed first in the schema
>> 2) all tables must have primary keys
>>
>> However, we're in the process of lifting the first restriction (hoping to
>> finish it in the coming weeks), and I wouldn't be surprised if at some
>> point we lifted the second as well (some tables work just fine as 'bags'
>> with no need for PK constraints or organization).
>>
>> -Todd
>>
>>
>> On Fri, Sep 9, 2016 at 9:01 AM, David Alves <[email protected]>
>> wrote:
>>
>> > Oh I see what you mean.
>> > It's not that the keys are getting dropped, its that they're not marked
>> as
>> > keys.
>> > This arguably makes sense on a projection: for instance you might want
>> the
>> > keys returns in the end of the projection, while table schemas (at least
>> > for now) require that they are present at the beginning of the
>> projection.
>> > If you really want to create a new table based on an existing one, you
>> > could get the schema from KuduTable. That one should be complete.
>> >
>> > -david
>> >
>> > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell <
>> [email protected]
>> > >
>> > wrote:
>> >
>> > > Right, what i'm saying is, if i do include the key in my projection,
>> the
>> > > schema does not maintain it as a key. The issue isnt so much that i
>> cant
>> > > apply predicates to the key column, its that if i wanted to create a
>> > > projection and then want to use that projection to create a table
>> based
>> > on
>> > > that projection, i'd have to rebuild the schema (i.e., the schema
>> > returned
>> > > is effectively useless for creating new tables). This pattern of
>> > creating
>> > > tables from projections is pretty common in dataframe like libraries
>> in
>> > > python.
>> > >
>> > > gist of offending code with comment on issue:
>> > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7
>> > >
>> > >
>> > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <[email protected]>
>> > wrote:
>> > >
>> > > > Wait, If you _do_ set a projection on the scanner that does not
>> include
>> > > the
>> > > > keys, then they won't be returned (and won't appear on the
>> projection's
>> > > > schema).
>> > > > Note that this does not mean that you can't set predicates on the
>> key,
>> > > it's
>> > > > just that they'll be evaluated server side, but the key won't
>> actually
>> > be
>> > > > returned.
>> > > > Maybe I'm misunderstanding what you're saying?
>> > > > Care to post a gist with the offending code?
>> > > >
>> > > > -david
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell <
>> > > [email protected]
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hey David,
>> > > > >
>> > > > > Yep, i'm sure, taking a look at the scan_configuration class, the
>> > issue
>> > > > > seems to be here:
>> > > > >
>> > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const
>> > vector<int>&
>> > > > > col_indexes) {
>> > > > > ....
>> > > > > RETURN_NOT_OK*(s->Reset(cols, 0));*
>> > > > > ....
>> > > > >
>> > > > > In the SetProjectedColumnIndexes method (which is also used by
>> > > > > SetProjectedColumnNames), we're setting the schema without the
>> index.
>> > > > >
>> > > > > There are probably a couple of ways to address this:
>> > > > >
>> > > > > 1. Check if all key columns are in the projection, and if so,
>> > > maintain
>> > > > > the key.
>> > > > > 2. Provide an optional parameter to be able to set the key to
>> > users
>> > > > > preference for the new projection. This would be beneficial for
>> > > cases
>> > > > > where
>> > > > > the user may want to create a new table based on their
>> projection.
>> > > > >
>> > > > > Thoughts?
>> > > > >
>> > > > > Jordan
>> > > > >
>> > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <
>> [email protected]>
>> > > > wrote:
>> > > > >
>> > > > > > Hi Jordan
>> > > > > >
>> > > > > > KuduScanner::GetProjectionSchema returns the schema of the
>> > > projection
>> > > > > > that was previously set on the scanner. If you don't a
>> projection
>> > it
>> > > > > should
>> > > > > > indeed return all the columns.
>> > > > > > Are you sure you didn't set a projection (with
>> > > > SetProjectedColumnNames
>> > > > > > or SetProjectedColumnIndexes) that excluded the key?
>> > > > > >
>> > > > > > Best
>> > > > > > David
>> > > > > >
>> > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell <
>> > > > > [email protected]
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hey folks,
>> > > > > > >
>> > > > > > > i was doing some work on KUDU-854
>> > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when
>> > testing
>> > > > the
>> > > > > > > KuduScanner::GetProjectionSchema method call, found that the
>> key
>> > > was
>> > > > > > being
>> > > > > > > dropped, which makes this much more challenging to test. Any
>> > ideas
>> > > if
>> > > > > it
>> > > > > > is
>> > > > > > > intended to drop the key information in a scanner projection?
>> I
>> > > would
>> > > > > > > imagine this could prevent functionality like creating new
>> tables
>> > > > from
>> > > > > a
>> > > > > > > projection.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Jordan
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>