[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata

Sahil Takiar (JIRA) Thu, 15 Nov 2018 13:00:24 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688645#comment-16688645
 ]


Sahil Takiar commented on IMPALA-7087:
--------------------------------------

Do we want to tackle Parquet files that have a higher scale compared to the 
table (e.g. a Parquet file written with scale = 4 being loaded into a table 
with scale = 2)? It seems like this is a valid pattern in other databases. The 
returned values just have their least significant digits truncated. Here is 
what other SQL engines do:

*Postgres:*

Postgres is able to load data with a higher scale into a table with a lower 
scale.

{code:java}
postgres@stakiar-desktop:~$ printf "col1\n1.111" > /tmp/tmp.txt
test=# create table dec_test (dec_col decimal(10,2));
test=# copy dec_test(dec_col) from '/tmp/tmp.txt' delimiter ',' csv header;
test=# select * from dec_test;
 dec_col 
---------
    1.11
(1 row)
{code}
The data was written to {{/tmp/tmp.txt}} as {{1.111}}, but is returned by 
Postgres as {{1.11}}.

*Hive:*

Hive follows the same behavior as Postgres.
{code:java}
create table dec_test_high_scale (dec_col decimal(10,4)) stored as parquet;
insert into table dec_test_high_scale values (1.1111);
create table dec_test_low_scale (dec_col decimal(10,2)) stored as parquet 
location 'hdfs://[nn]:[port]/user/hive/warehouse/dec_test_high_scale';
select * from dec_test_low_scale;
1.11
{code}

> Impala is unable to read Parquet decimal columns with lower precision/scale 
> than table metadata
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7087
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7087
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Sahil Takiar
>            Priority: Major
>              Labels: decimal, parquet
>
> This is similar to IMPALA-2515, except relates to a different precision/scale 
> in the file metadata rather than just a mismatch in the bytes used to store 
> the data. In a lot of cases we should be able to convert the decimal type on 
> the fly to the higher-precision type.
> {noformat}
> ERROR: File '/hdfs/path/000000_0_x_2' column 'alterd_decimal' has an invalid 
> type length. Expecting: 11 len in file: 8
> {noformat}
> It would be convenient to allow reading parquet files where the 
> precision/scale in the file can be converted to the precision/scale in the 
> table metadata without loss of precision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata

Reply via email to