[
https://issues.apache.org/jira/browse/IMPALA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324270#comment-17324270
]
Csaba Ringhofer commented on IMPALA-10350:
------------------------------------------
I looked into the review and would add some thoughts here:
>(I'm using the jdbc client because the impala shell rounds doubles for me)
An easy hack is to use impala shell with "--protocol beeswax" as beeswax
returns strings for all types so the double->string conversion is done in c++
The issue in hs2 is simply that impala shell gets the double from Thrifts
correctly, but we convert it to string with Python's str() which uses less
precision than the 17 significant digits usual in c++.
Created a ticket about this: IMPALA-10660
> Hive writes don't lose precision. If the data was written by Hive then Impala
> can read the values correctly:
I am not sure that Hive is more correct than Impala.
for select cast('-0.43149576573887316' as double); Hive returns the exact
number. But it returns the same number ..316 from ...314 to ... 318, so Hive
also rounds at this precision, it just does it differently than Impala. AFAIK
Hive actually deals with decimals in a less standard way than Impala.
according to
https://en.wikipedia.org/wiki/Double-precision_floating-point_format :
"if a decimal string with at most 15 significant digits is converted to IEEE
754 double-precision representation, and then converted back to a decimal
string with the same number of digits, the final result should match the
original string."
-0.43149576573887316 has 17 digits, and at 16/17 digits double is not
guaranteed to represent a decimal without loss (while a "normal" double should
be convertible to a decimal of 17 digits without loss)
>https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h
As we are already including this library, we could try to use it for its
original purpose too, parsing strings to doubles, as we have a similar
precision issue there too:
select cast("0.43149576573887316" as double);
result: 0.4314957657388731
Hive returns a different result in this case too.
We convert to string to double with a "home made" function:
https://github.com/apache/impala/blob/master/be/src/util/string-parser.h#L459
[~boroknagyz][~amargoor]
What do you think, should I create a new ticket for string->double, or you can
add it to the scope of this one? I think it would make sense to solve both in
one patch as string conversion issues may creep in unexpectedly, e.g. during
reading text files and affect the tests.
> Impala loses double precision because of DECIMAL->DOUBLE cast
> -------------------------------------------------------------
>
> Key: IMPALA-10350
> URL: https://issues.apache.org/jira/browse/IMPALA-10350
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Zoltán Borók-Nagy
> Assignee: Amogh Margoor
> Priority: Major
> Labels: correctness, ramp-up
> Attachments: test.c
>
>
> Impala might loses presision of double values. Reproduction:
> {noformat}
> create table double_tbl (d double) stored as textfile;
> insert into double_tbl values (-0.43149576573887316);
> {noformat}
> Then inspect the data file:
> {noformat}
> $ hdfs dfs -cat
> /test-warehouse/double_tbl/424097c644088674-c55b910100000000_175064830_data.0.txt
> -0.4314957657388731{noformat}
> The same happens if we store our data in Parquet.
> Hive writes don't lose precision. If the data was written by Hive then Impala
> can read the values correctly:
> {noformat}
> $ bin/run-jdbc-client.sh -t NOSASL -q "select * from double_tbl;"
> Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
> Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
> Executing: select * from double_tbl
> ----[START]----
> -0.43149576573887316
> ----[END]----{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]