Miklos Szurap created IMPALA-12918:
--------------------------------------
Summary: Do not allow non-numeric values in Hive table stats
during an alter table
Key: IMPALA-12918
URL: https://issues.apache.org/jira/browse/IMPALA-12918
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Miklos Szurap
Hive table properties are string in their nature, however some of them have
special meaning and should have numeric values, like the "totalSize",
"numRows", "rawDataSize".
Impala currently allows these to be set to non-numeric values (including empty
string).
>From certain applications (like from Spark) we get quite obscure
>"NumberFormatException" errors while trying to access such broken tables. (see
>SPARK-47444)
Impala should also validate "alter table" statements and not allow non-numeric
values in the "totalSize", "numRows", "rawDataSize" table properties.
For example a query which may break the table (after it can't be read Spark):
{code}
[impalacoordinator:21000] default> alter table t1p set
tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
{code}
Note: beeline/Hive validates alter table statements with the "numRows" and
"rawDataSize", the "totalSize" still needs validation there too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)