[ 
https://issues.apache.org/jira/browse/IMPALA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel\#worklog-999585
 ]

Kunal Siyag logged work on IMPALA-12918:
----------------------------------------

                  Author: Kunal Siyag
               Edited by: Kunal Siyag
              Created on: 12/Jan/26 12:09
               Edited on: 12/Jan/26 12:22
              Start Date: 01/Jan/26 20:00 
      Worklog Time Spent: 336h (was: 2m)
        Work Description: Recreated the bug in my system, then

Added validation for numeric table stats properties (numRows, totalSize, 
rawDataSize) during ALTER TABLE SET TBLPROPERTIES operations.

All validation logic verified with standalone tests (13/13 tests passing)

ALTER TABLE t SET TBLPROPERTIES('numRows'='');           -- Blocked
ALTER TABLE t SET TBLPROPERTIES('numRows'='abc');        -- Blocked
ALTER TABLE t SET TBLPROPERTIES('totalSize'='');         -- Blocked
ALTER TABLE t SET TBLPROPERTIES('rawDataSize'='xyz');    -- Blocked


Issue Time Tracking
-------------------

            Worklog Id:     (was: 999585)
    Remaining Estimate: 336h  (was: 0h)
            Time Spent: 336h 10m  (was: 0.2h)
        
> Do not allow non-numeric values in Hive table stats during an alter table
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-12918
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12918
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0
>            Reporter: Miklos Szurap
>            Assignee: Kunal Siyag
>            Priority: Major
>              Labels: alter, alter-table, catalog-2024, newbie, ramp-up, 
> stats, validation
>          Time Spent: 336h 10m
>  Remaining Estimate: 336h
>
> Hive table properties are string in their nature, however some of them have 
> special meaning and should have numeric values, like the "totalSize", 
> "numRows", "rawDataSize". 
> Impala currently allows these to be set to non-numeric values (including 
> empty string).
> From certain applications (like from Spark) we get quite obscure 
> "NumberFormatException" errors while trying to access such broken tables. 
> (see SPARK-47444)
> Impala should also validate "alter table" statements and not allow 
> non-numeric values in the "totalSize", "numRows", "rawDataSize" table 
> properties.
> For example a query which may break the table (after it can't be read from 
> Spark):
> {code}
> [impalacoordinator:21000] default> alter table t1p set 
> tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
> {code}
> Note: beeline/Hive validates alter table statements with the "numRows" and 
> "rawDataSize", the "totalSize" still needs validation there too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to