[jira] [Commented] (IMPALA-10350) Impala loses double precision because of DECIMAL->DOUBLE cast

Csaba Ringhofer (Jira) Sat, 17 Apr 2021 06:38:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324270#comment-17324270
 ]


Csaba Ringhofer commented on IMPALA-10350:
------------------------------------------

I looked into the review and would add some thoughts here:

>(I'm using the jdbc client because the impala shell rounds doubles for me)
An easy hack is to use impala shell with "--protocol beeswax" as beeswax 
returns strings for all types so the double->string conversion is done in c++
The issue in hs2 is simply that impala shell gets the double from Thrifts 
correctly, but we convert it to string with Python's str() which uses less 
precision than the 17 significant digits usual in c++.
Created a ticket about this: IMPALA-10660

> Hive writes don't lose precision. If the data was written by Hive then Impala 
> can read the values correctly:
I am not sure that Hive is more correct than Impala.
for select cast('-0.43149576573887316' as double); Hive returns the exact 
number. But it returns the same number ..316 from ...314 to ... 318, so Hive 
also rounds at this precision, it just does it differently than Impala. AFAIK 
Hive actually deals with decimals in a less standard way than Impala.

according to 
https://en.wikipedia.org/wiki/Double-precision_floating-point_format :
"if a decimal string with at most 15 significant digits is converted to IEEE 
754 double-precision representation, and then converted back to a decimal 
string with the same number of digits, the final result should match the 
original string." 

-0.43149576573887316 has 17 digits, and at 16/17 digits double is not 
guaranteed to represent a decimal without loss (while a "normal" double should 
be convertible to a decimal of 17 digits without loss)

>https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h
As we are already including this library, we could try to use it for its 
original purpose too, parsing strings to doubles, as we have a similar 
precision issue there too:
select cast("0.43149576573887316" as double);
result:  0.4314957657388731
Hive returns a different result in this case too.

We convert to string to double with a "home made" function:
https://github.com/apache/impala/blob/master/be/src/util/string-parser.h#L459

[~boroknagyz][~amargoor]
What do you think, should I create a new ticket for string->double, or you can 
add it to the scope of this one? I think it would make sense to solve both in 
one patch as string conversion issues may creep in unexpectedly, e.g. during 
reading text files and affect the tests.

> Impala loses double precision because of DECIMAL->DOUBLE cast
> -------------------------------------------------------------
>
>                 Key: IMPALA-10350
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10350
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Amogh Margoor
>            Priority: Major
>              Labels: correctness, ramp-up
>         Attachments: test.c
>
>
> Impala might loses presision of double values. Reproduction: 
> {noformat}
> create table double_tbl (d double) stored as textfile;
> insert into double_tbl values (-0.43149576573887316);
> {noformat}
>  Then inspect the data file:
> {noformat}
> $ hdfs dfs -cat 
> /test-warehouse/double_tbl/424097c644088674-c55b910100000000_175064830_data.0.txt
>  -0.4314957657388731{noformat}
> The same happens if we store our data in Parquet.
> Hive writes don't lose precision. If the data was written by Hive then Impala 
> can read the values correctly:
> {noformat}
> $ bin/run-jdbc-client.sh -t NOSASL -q "select * from double_tbl;"
> Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
> Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
> Executing: select * from double_tbl
> ----[START]----
> -0.43149576573887316
> ----[END]----{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10350) Impala loses double precision because of DECIMAL->DOUBLE cast

Reply via email to