[ 
https://issues.apache.org/jira/browse/IMPALA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320122#comment-17320122
 ] 

Zoltán Borók-Nagy commented on IMPALA-10350:
--------------------------------------------

Thanks for pointing this out [~amargoor]. AFAICT it only affects TEXT tables. 
Precision 16 has been chosen in a very old 
[commit|https://github.com/apache/impala/commit/ed2edac79d2b69eadff28c0764b90ecfccdfeb47].
 I think it makes sense to increase it to 20, but it'll require updating a lot 
of tests, so we might want to do that in a separate sub-task. It can also 
increase the size of data files so I think it's also better to discuss it 
separately whether we want to do that. In the [Impala 
docs|https://impala.apache.org/docs/build/html/topics/impala_double.html] we 
only guarantee 15-17 significant digits precision anyway.

AFAICT IMPALA-10654 provides better precision for data in PARQUET tables 
(without affecting the file size), and it's also more accurate whenever we 
convert decimals to doubles, so the patch already has great value.

> Impala loses double precision because of DECIMAL->DOUBLE cast
> -------------------------------------------------------------
>
>                 Key: IMPALA-10350
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10350
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Amogh Margoor
>            Priority: Major
>              Labels: correctness, ramp-up
>         Attachments: test.c
>
>
> Impala might loses presision of double values. Reproduction: 
> {noformat}
> create table double_tbl (d double) stored as textfile;
> insert into double_tbl values (-0.43149576573887316);
> {noformat}
>  Then inspect the data file:
> {noformat}
> $ hdfs dfs -cat 
> /test-warehouse/double_tbl/424097c644088674-c55b910100000000_175064830_data.0.txt
>  -0.4314957657388731{noformat}
> The same happens if we store our data in Parquet.
> Hive writes don't lose precision. If the data was written by Hive then Impala 
> can read the values correctly:
> {noformat}
> $ bin/run-jdbc-client.sh -t NOSASL -q "select * from double_tbl;"
> Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
> Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
> Executing: select * from double_tbl
> ----[START]----
> -0.43149576573887316
> ----[END]----{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to