https://issues.apache.org/jira/browse/IMPALA-6307 
<https://issues.apache.org/jira/browse/IMPALA-6307>

Fix version/s: Impala 2.12.0

P..erfect timing!

John
..
> On Jan 29, 2018, at 9:05 AM, Alexander Behm <[email protected]> wrote:
> 
> Looks like IMPALA-6307
> 
> On Fri, Jan 26, 2018 at 5:06 PM, John Russell <[email protected]> wrote:
> 
>> Hi all,
>> 
>> I wanted to experiment with a few different partitioning layouts, so I did
>> CTAS statements with PARTITION BY clauses from various tables and views in
>> my schema.  One CTAS failed like so, with a “duplicate column” error:
>> 
>> [localhost:21000] > desc report_categories;
>> +----------------+-----------+---------+
>> | name           | type      | comment |
>> +----------------+-----------+---------+
>> | ip             | string    |         |
>> | f2             | string    |         |
>> | f3             | string    |         |
>> | the_date       | timestamp |         |
>> | method         | string    |         |
>> | path           | string    |         |
>> | status         | smallint  |         |
>> | size           | bigint    |         |
>> | referer        | string    |         |
>> | agent          | string    |         |
>> | is_search_term | boolean   |         |
>> | search_term    | string    |         |
>> | is_doc_page    | boolean   |         |
>> | doc_page       | string    |         |
>> | category       | string    |         |
>> | version        | string    |         |
>> | format         | string    |         |
>> | yy             | int       |         |
>> | mm             | int       |         |
>> | dd             | int       |         |
>> +----------------+-----------+---------+
>> [localhost:21000] > create table report_categories_by_status_format_yy_mm
>> partitioned by (`status`, `format`, yy, mm) stored as parquet
>> as
>> select ip, f2, f3, the_date, method, path, size, referer, agent,
>> is_search_term,
>> search_term, is_doc_page, doc_page, category, version, dd, `status`,
>> `format`, yy, mm
>> from report_categories;
>> ERROR: AnalysisException: Duplicate column name: status
>> 
>> The interesting thing is, REPORT_CATEGORIES is a view that SELECTs a bunch
>> of columns (including one named `STATUS`) via SELECT * from an underlying
>> table.  If I make a real table with the same column definitions as the
>> REPORT_CATEGORY view, then the above CTAS works when selecting from the
>> real table.
>> 
>> Is it to be expected that the use of a column X in a view definition would
>> prevent a CTAS from creating a partitioned table with X as one of the
>> partition key columns?  (This is on Impala 2.7 BTW.)
>> 
>> Thanks,
>> John
>> 
>> 

Reply via email to