[
https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688645#comment-16688645
]
Sahil Takiar commented on IMPALA-7087:
--------------------------------------
Do we want to tackle Parquet files that have a higher scale compared to the
table (e.g. a Parquet file written with scale = 4 being loaded into a table
with scale = 2)? It seems like this is a valid pattern in other databases. The
returned values just have their least significant digits truncated. Here is
what other SQL engines do:
*Postgres:*
Postgres is able to load data with a higher scale into a table with a lower
scale.
{code:java}
postgres@stakiar-desktop:~$ printf "col1\n1.111" > /tmp/tmp.txt
test=# create table dec_test (dec_col decimal(10,2));
test=# copy dec_test(dec_col) from '/tmp/tmp.txt' delimiter ',' csv header;
test=# select * from dec_test;
dec_col
---------
1.11
(1 row)
{code}
The data was written to {{/tmp/tmp.txt}} as {{1.111}}, but is returned by
Postgres as {{1.11}}.
*Hive:*
Hive follows the same behavior as Postgres.
{code:java}
create table dec_test_high_scale (dec_col decimal(10,4)) stored as parquet;
insert into table dec_test_high_scale values (1.1111);
create table dec_test_low_scale (dec_col decimal(10,2)) stored as parquet
location 'hdfs://[nn]:[port]/user/hive/warehouse/dec_test_high_scale';
select * from dec_test_low_scale;
1.11
{code}
> Impala is unable to read Parquet decimal columns with lower precision/scale
> than table metadata
> -----------------------------------------------------------------------------------------------
>
> Key: IMPALA-7087
> URL: https://issues.apache.org/jira/browse/IMPALA-7087
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Reporter: Tim Armstrong
> Assignee: Sahil Takiar
> Priority: Major
> Labels: decimal, parquet
>
> This is similar to IMPALA-2515, except relates to a different precision/scale
> in the file metadata rather than just a mismatch in the bytes used to store
> the data. In a lot of cases we should be able to convert the decimal type on
> the fly to the higher-precision type.
> {noformat}
> ERROR: File '/hdfs/path/000000_0_x_2' column 'alterd_decimal' has an invalid
> type length. Expecting: 11 len in file: 8
> {noformat}
> It would be convenient to allow reading parquet files where the
> precision/scale in the file can be converted to the precision/scale in the
> table metadata without loss of precision.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]