Quanlong Huang created IMPALA-13990:
---------------------------------------
Summary: Allow padding/truncating decimal values from Parquet/ORC
files
Key: IMPALA-13990
URL: https://issues.apache.org/jira/browse/IMPALA-13990
Project: IMPALA
Issue Type: New Feature
Components: Backend
Reporter: Quanlong Huang
When a column in decimal type has different precision or scale in the table
schema and file schema, Impala will reject reading the file to avoid lossing
precisions.
For Parquet files, the error is
{code:java}
File 'hdfs://localhost:20500/test-warehouse/parq_tbl/000000_0' column 'd38_18'
has a precision that does not match the table metadata precision. File metadata
precision: 38, table metadata precision: 22. {code}
For ORC files, the error is
{code:java}
Type mismatch: table column DECIMAL(22,6) is map to column decimal(38,18) in
ORC file 'hdfs://localhost:20500/test-warehouse/tbl/000000_0'{code}
Hive is able to support such scenario:
{code:sql}
create external table tbl (d22_6 decimal(22,6), d38_18 decimal(38,18)) stored
as orc;
insert into tbl select pi(), pi();
select * from tbl;
+------------+-----------------------+
| tbl.d22_6 | tbl.d38_18 |
+------------+-----------------------+
| 3.141593 | 3.141592653589793000 |
+------------+-----------------------+
-- create a new table pointing to the above location, using decimal(22,6) for
the second column
create external table tbl2 (d22_6 decimal(22,6), d38_18 decimal(22,6)) stored
as orc location '/test-warehouse/tbl';
select * from tbl2;
+-------------+--------------+
| tbl2.d22_6 | tbl2.d38_18 |
+-------------+--------------+
| 3.141593 | 3.141593 |
+-------------+--------------+{code}
Though lossing precissions, it's still helpful to show the truncated values. We
can add a query option to allow such behavior to be consistent with Hive.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)