[ 
https://issues.apache.org/jira/browse/IMPALA-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316999#comment-17316999
 ] 

Zoltán Borók-Nagy commented on IMPALA-10350:
--------------------------------------------

[~amargoor] I think strtod is fine, we just hit the limitations of double 
precision with the value -0.43149576573887374.

[https://onlinegdb.com/Bk90zB2rd] (C++17)

[https://onlinegdb.com/ByecxQHhBO] (Java)

Lemire's algorithm has a fast path that can be used in most cases: 
[https://github.com/lemire/fast_double_parser/blob/e4f6319bfa9cbc829f7f99ae88c1d2fb205c15e8/include/fast_double_parser.h#L893]

It uses a similar representation that Impala is using, i.e. an integer + scale 
(power).

It also has a secondary fast path: 

[https://github.com/lemire/fast_double_parser/blob/e4f6319bfa9cbc829f7f99ae88c1d2fb205c15e8/include/fast_double_parser.h#L921]

And if compute_float_64() fails it falls back to strtod: 
[https://github.com/lemire/fast_double_parser/blob/e4f6319bfa9cbc829f7f99ae88c1d2fb205c15e8/include/fast_double_parser.h#L1254-L1257]

Probably we could try to use compute_float_64() and when it fails we could just 
fall back similarly.

Based on my previous comment google/wuffs uses a different represantation, i.e. 
we'd need to generate the string representation of the decimal value first.

> Impala loses double precision because of DECIMAL->DOUBLE cast
> -------------------------------------------------------------
>
>                 Key: IMPALA-10350
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10350
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Amogh Margoor
>            Priority: Major
>              Labels: correctness, ramp-up
>         Attachments: test.c
>
>
> Impala might loses presision of double values. Reproduction: 
> {noformat}
> create table double_tbl (d double) stored as textfile;
> insert into double_tbl values (-0.43149576573887316);
> {noformat}
>  Then inspect the data file:
> {noformat}
> $ hdfs dfs -cat 
> /test-warehouse/double_tbl/424097c644088674-c55b910100000000_175064830_data.0.txt
>  -0.4314957657388731{noformat}
> The same happens if we store our data in Parquet.
> Hive writes don't lose precision. If the data was written by Hive then Impala 
> can read the values correctly:
> {noformat}
> $ bin/run-jdbc-client.sh -t NOSASL -q "select * from double_tbl;"
> Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
> Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
> Executing: select * from double_tbl
> ----[START]----
> -0.43149576573887316
> ----[END]----{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to