Hi all, I wanted to experiment with a few different partitioning layouts, so I did CTAS statements with PARTITION BY clauses from various tables and views in my schema. One CTAS failed like so, with a “duplicate column” error:
[localhost:21000] > desc report_categories; +----------------+-----------+---------+ | name | type | comment | +----------------+-----------+---------+ | ip | string | | | f2 | string | | | f3 | string | | | the_date | timestamp | | | method | string | | | path | string | | | status | smallint | | | size | bigint | | | referer | string | | | agent | string | | | is_search_term | boolean | | | search_term | string | | | is_doc_page | boolean | | | doc_page | string | | | category | string | | | version | string | | | format | string | | | yy | int | | | mm | int | | | dd | int | | +----------------+-----------+---------+ [localhost:21000] > create table report_categories_by_status_format_yy_mm partitioned by (`status`, `format`, yy, mm) stored as parquet as select ip, f2, f3, the_date, method, path, size, referer, agent, is_search_term, search_term, is_doc_page, doc_page, category, version, dd, `status`, `format`, yy, mm from report_categories; ERROR: AnalysisException: Duplicate column name: status The interesting thing is, REPORT_CATEGORIES is a view that SELECTs a bunch of columns (including one named `STATUS`) via SELECT * from an underlying table. If I make a real table with the same column definitions as the REPORT_CATEGORY view, then the above CTAS works when selecting from the real table. Is it to be expected that the use of a column X in a view definition would prevent a CTAS from creating a partitioned table with X as one of the partition key columns? (This is on Impala 2.7 BTW.) Thanks, John
