[
https://issues.apache.org/jira/browse/IMPALA-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825470#comment-17825470
]
Quanlong Huang commented on IMPALA-12889:
-----------------------------------------
Here is where catalogd processes the request of changing fileformat:
https://github.com/apache/impala/blob/085b1806da6a1941200288a2f9a243e389e10820/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1193-L1204
reloadTableSchema is unchanged so it's false, which leads to not reloading the
avro schema.
> Changing file format to AVRO doesn't update schema using 'avro.schema.url'
> --------------------------------------------------------------------------
>
> Key: IMPALA-12889
> URL: https://issues.apache.org/jira/browse/IMPALA-12889
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Priority: Major
> Labels: ramp-up
> Attachments: alltypes.json
>
>
> When changing the file format of a table to AVRO, the schema is not updated
> if there is a tblproperty of 'avro.schema.url'. However, after a REFRESH, the
> schema is updated:
> {code:sql}
> create table my_part_tbl(i int) partitioned by (p int) stored as parquet;
> alter table my_part_tbl set tblproperties(
>
> 'avro.schema.url'='hdfs:////test-warehouse/avro_schemas/functional/alltypes.json');
> alter table my_part_tbl set fileformat avro;
> describe my_part_tbl
> +------+------+---------+
> | name | type | comment |
> +------+------+---------+
> | i | int | |
> | p | int | |
> +------+------+---------+
> refresh my_part_tbl;
> describe my_part_tbl
> +-----------------+---------+-------------------+
> | name | type | comment |
> +-----------------+---------+-------------------+
> | id | int | from deserializer |
> | bool_col | boolean | from deserializer |
> | tinyint_col | int | from deserializer |
> | smallint_col | int | from deserializer |
> | int_col | int | from deserializer |
> | bigint_col | bigint | from deserializer |
> | float_col | float | from deserializer |
> | double_col | double | from deserializer |
> | date_string_col | string | from deserializer |
> | string_col | string | from deserializer |
> | timestamp_col | string | from deserializer |
> | p | int | |
> +-----------------+---------+-------------------+
> {code}
> Note that explicitly setting the tblproperty after changing the file format
> to AVRO does refresh the schema. I.e. changing fileformat before setting
> 'avro.schema.url' works, but setting 'avro.schema.url' before changing
> fileformat doesn't work.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]