[ 
https://issues.apache.org/jira/browse/IMPALA-12918?focusedWorklogId=999589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-999589
 ]

ASF GitHub Bot logged work on IMPALA-12918:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Jan/26 12:13
            Start Date: 12/Jan/26 12:13
    Worklog Time Spent: 10m 
      Work Description: KunalSiyag opened a new pull request, #84:
URL: https://github.com/apache/impala/pull/84

   Adds validation to reject non-numeric values (including empty strings) for 
numRows, totalSize, and rawDataSize table properties during ALTER TABLE SET 
TBLPROPERTIES operations.
   
   Previously, Impala allowed setting these properties to any string value, 
which could cause NumberFormatException errors in downstream applications like 
Spark when they tried to parse these values.
   
   Changes:
   - Added analyzeTableStatsProperties() method in 
AlterTableSetTblProperties.java that validates these properties contain 
parseable long values
   - Added corresponding test cases in AnalyzeDDLTest.java
   - Added new test file TableStatsValidationBugTest.java with comprehensive 
tests
   
   Testing:
   - All validation logic verified with unit tests
   - Tested that empty strings, non-numeric values, and whitespace-only values 
are correctly rejected with clear error messages
   - Tested that valid numeric values (including 0, -1, positive numbers) pass
   
   Example error messages:
   - "Table property 'numRows' must have a valid numeric value, got empty 
value."
   - "Table property 'numRows' must have a valid numeric value, got 'abc'."




Issue Time Tracking
-------------------

    Worklog Id:     (was: 999589)
    Time Spent: 0.2h  (was: 2m)

> Do not allow non-numeric values in Hive table stats during an alter table
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-12918
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12918
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0
>            Reporter: Miklos Szurap
>            Assignee: Kunal Siyag
>            Priority: Major
>              Labels: alter, alter-table, catalog-2024, newbie, ramp-up, 
> stats, validation
>          Time Spent: 0.2h
>  Remaining Estimate: 0h
>
> Hive table properties are string in their nature, however some of them have 
> special meaning and should have numeric values, like the "totalSize", 
> "numRows", "rawDataSize". 
> Impala currently allows these to be set to non-numeric values (including 
> empty string).
> From certain applications (like from Spark) we get quite obscure 
> "NumberFormatException" errors while trying to access such broken tables. 
> (see SPARK-47444)
> Impala should also validate "alter table" statements and not allow 
> non-numeric values in the "totalSize", "numRows", "rawDataSize" table 
> properties.
> For example a query which may break the table (after it can't be read from 
> Spark):
> {code}
> [impalacoordinator:21000] default> alter table t1p set 
> tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
> {code}
> Note: beeline/Hive validates alter table statements with the "numRows" and 
> "rawDataSize", the "totalSize" still needs validation there too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to