Looks like IMPALA-6307 On Fri, Jan 26, 2018 at 5:06 PM, John Russell <[email protected]> wrote:
> Hi all, > > I wanted to experiment with a few different partitioning layouts, so I did > CTAS statements with PARTITION BY clauses from various tables and views in > my schema. One CTAS failed like so, with a “duplicate column” error: > > [localhost:21000] > desc report_categories; > +----------------+-----------+---------+ > | name | type | comment | > +----------------+-----------+---------+ > | ip | string | | > | f2 | string | | > | f3 | string | | > | the_date | timestamp | | > | method | string | | > | path | string | | > | status | smallint | | > | size | bigint | | > | referer | string | | > | agent | string | | > | is_search_term | boolean | | > | search_term | string | | > | is_doc_page | boolean | | > | doc_page | string | | > | category | string | | > | version | string | | > | format | string | | > | yy | int | | > | mm | int | | > | dd | int | | > +----------------+-----------+---------+ > [localhost:21000] > create table report_categories_by_status_format_yy_mm > partitioned by (`status`, `format`, yy, mm) stored as parquet > as > select ip, f2, f3, the_date, method, path, size, referer, agent, > is_search_term, > search_term, is_doc_page, doc_page, category, version, dd, `status`, > `format`, yy, mm > from report_categories; > ERROR: AnalysisException: Duplicate column name: status > > The interesting thing is, REPORT_CATEGORIES is a view that SELECTs a bunch > of columns (including one named `STATUS`) via SELECT * from an underlying > table. If I make a real table with the same column definitions as the > REPORT_CATEGORY view, then the above CTAS works when selecting from the > real table. > > Is it to be expected that the use of a column X in a view definition would > prevent a CTAS from creating a partitioned table with X as one of the > partition key columns? (This is on Impala 2.7 BTW.) > > Thanks, > John > >
