ORC files

Quanlong Huang (Jira) Thu, 24 Apr 2025 04:53:07 -0700

Quanlong Huang created IMPALA-13990:
---------------------------------------


             Summary: Allow padding/truncating decimal values from Parquet/ORC 
files
                 Key: IMPALA-13990
                 URL: https://issues.apache.org/jira/browse/IMPALA-13990
             Project: IMPALA
          Issue Type: New Feature
          Components: Backend
            Reporter: Quanlong Huang


When a column in decimal type has different precision or scale in the table 
schema and file schema, Impala will reject reading the file to avoid lossing 
precisions.

For Parquet files, the error is
{code:java}
File 'hdfs://localhost:20500/test-warehouse/parq_tbl/000000_0' column 'd38_18' 
has a precision that does not match the table metadata precision. File metadata 
precision: 38, table metadata precision: 22. {code}
For ORC files, the error is
{code:java}
Type mismatch: table column DECIMAL(22,6) is map to column decimal(38,18) in 
ORC file 'hdfs://localhost:20500/test-warehouse/tbl/000000_0'{code}
Hive is able to support such scenario:
{code:sql}
create external table tbl (d22_6 decimal(22,6), d38_18 decimal(38,18)) stored 
as orc;
insert into tbl select pi(), pi();
select * from tbl;
+------------+-----------------------+
| tbl.d22_6  |      tbl.d38_18       |
+------------+-----------------------+
| 3.141593   | 3.141592653589793000  |
+------------+-----------------------+

-- create a new table pointing to the above location, using decimal(22,6) for 
the second column
create external table tbl2 (d22_6 decimal(22,6), d38_18 decimal(22,6)) stored 
as orc location '/test-warehouse/tbl';
select * from tbl2;
+-------------+--------------+
| tbl2.d22_6  | tbl2.d38_18  |
+-------------+--------------+
| 3.141593    | 3.141593     |
+-------------+--------------+{code}

Though lossing precissions, it's still helpful to show the truncated values. We 
can add a query option to allow such behavior to be consistent with Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13990) Allow padding/truncating decimal values from Parquet/ORC files

Reply via email to