[jira] [Updated] (DRILL-6154) NaN, Infinity issues

Volodymyr Tkach (JIRA) Wed, 14 Feb 2018 04:23:48 -0800

     [ 
https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Volodymyr Tkach updated DRILL-6154:
-----------------------------------
    Description: 
1.Issue

*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONS:*
 - *sqrt*
 - *trunc*

*ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
added: *NaN, Infinity, -Infinity*. The new data types must be processed 
properly by existing functions. There are a few issues:
 *1. SQRT function*. Run the following test query:
{code:java}
select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) 
NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
 
 - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
Infinity, NaN_ (expected result is based on java Math.sqrt() method)
 - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_

*2. TRUNC function*. According to DRILL docs 
([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x to 
y decimal places. *Specifying y is optional. Default is 1*_. So, the function 
must work properly without specifying *y*
 However an error message appears. Run test_query:
{code:java}
select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
 - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, NaN*
 - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: NumberFormatException Fragment 0:0 [Error Id: 
95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*

Please investigate and fix, test file attached PN_Inf_NaN.json

2. Issue 
 *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* INNER JOIN

*ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: *NaN, 
Infinity, -Infinity*. 
 During testing activities, it was detected a bit strange behavior of INNER 
JOIN operator - different query results in almost the same queries. 
 *Query1*
{code:java}
 select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
*Query2*
{code:java}
 select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
*Query1* differs from *Query2* by 1 columns only:
 - In *Query1* - 2 columns are selected - t.name, tt.name
 - In *Query2* - 1 column is selected - t.name

However *Query1*/*Query2* return completely different results:
 - *Query1* returns
{code:java}
        name         name0
        object2         object2
        object2         object3
        object2         object4
        object3         object2
        object3         object3
        object3         object4
        object4         object2
        object4         object3
        object4         object4
        {code}
This result seems to be correct.

 - *Query2* returns _*No result found*_, not expected:
 *EXPECTED_RESULT:*
{code:java}
        name
        object2
        object3
        object4
        {code}

*ACTUAL_RESULT:(*
{code:java}
No result found{code}
*NB!:* the issue appears only if tables are _*JOINed by a column which contains 
newly-added data types (NaN, Infinity, -Infinity)*_. The issue is not 
reproducible is a user is JOINing tables by a column containing other data types

3. Issue
 *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* ORDER BY, DESC

*THIS ISSUE REFERS TO:DRILL-5919*

*ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
sorting columns containing NaN values. In one case it considers NaN to be the 
largest value, in another - the smallest one. 
 *Steps:*
 - Select from the attached test file (orderBy.json, attached)
{code:java}
SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, attr4{code}

 - Check the attached screen shot (orderByIssue.jpg):
 *EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 columns 
data in the same way (most probably NaN should be the largest, see *NB*)
 *ACTUAL_RESULT:* attr4 column's values were sorted in different ways: for 
'obj1'/'obj3' NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.

*NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() 
methods) treats NaN as the largest value

4. Issue
 *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* max(column_name), min(column_name),

*ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if 
the selected column contains NaN value.
{code:java}
SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by 
name{code}
Result
{code:java}
name    Min     Max
obj1    NaN     NaN
obj2    NaN     NaN
obj3    NaN     NaN
obj4    NaN     NaN
{code}
As for me, this logic should be revised, current behavior is a bit confusing:
 - Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} will 
return NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's min/max 
logic can be adjusted to the Postgres' one, see postgres.jpg
 - Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs

  was:
1.Issue

*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONS:*
- *sqrt*
- *trunc*

*ISSUE_DESCRIPTION:* According to DRILL-5919/MD-2745, new json number literals 
were added: *NaN, Infinity, -Infinity*. The new data types must be processed 
properly by existing functions. There are a few issues:
*1. SQRT function*. Run the following test query: {code}select sqrt(Nan) NaN, 
sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) NEG_INF from 
dfs.tmp.`PN_Inf_NaN.json`{code} 
- EXPECTED_RESULT: it was expected to get the following result: _NaN, Infinity, 
NaN_ (expected result is based on java Math.sqrt() method)
- ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_

*2. TRUNC function*. According to DRILL docs 
(https://drill.apache.org/docs/math-and-trig/): _TRUNC(x, y) : Truncates x to y 
decimal places. *Specifying y is optional. Default is 1*_. So, the function 
must work properly without specifying *y*
However an error message appears. Run test_query: {code}select trunc(Nan) NaN, 
trunc(Positive_Infinity) POS_INF, trunc(Negative_Infinity) NEG_INF from 
dfs.tmp.`PN_Inf_NaN.json`{code}
- EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, NaN*
- ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: NumberFormatException Fragment 0:0 [Error Id: 
95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*

Please investigate and fix, test file attached PN_Inf_NaN.json

2. Issue 
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* INNER JOIN

*ISSUE_DESCRIPTION:* There were added new Json data types in 
MD-2745/DRILL-5919: *NaN, Infinity, -Infinity*. 
During testing activities, it was detected a bit strange behavior of INNER JOIN 
operator - different query results in almost the same queries. 
*Query1* {code} select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t 
inner join dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
*Query2* {code} select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}

*Query1* differs from *Query2* by 1 columns only:
- In *Query1* - 2 columns are selected - t.name, tt.name
- In *Query2* - 1 column is selected - t.name

However *Query1*/*Query2* return completely different results:
- *Query1* returns
        {code}
        name         name0
        object2         object2
        object2         object3
        object2         object4
        object3         object2
        object3         object3
        object3         object4
        object4         object2
        object4         object3
        object4         object4
        {code}
This result seems to be correct.

- *Query2* returns _*No result found*_, not expected:
        *EXPECTED_RESULT:*
        {code}
        name
        object2
        object3
        object4
        {code}
        
        *ACTUAL_RESULT*: {code}No result found{code}

*NB!:* the issue appears only if tables are _*JOINed by a column which contains 
newly-added data types (NaN, Infinity, -Infinity)*_. The issue is not 
reproducible is a user is JOINing tables by a column containing other data types

3. Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* ORDER BY, DESC

*THIS ISSUE REFERS TO:* MD-2745

*ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
sorting columns containing NaN values. In one case it considers NaN to be the 
largest value, in another - the smallest one. 
*Steps:*
- Select from the attached test file (orderBy.json, attached)
        {code}SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, 
attr4{code}
- Check the attached screen shot (orderByIssue.jpg):
        *EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 
columns data in the same way (most probably NaN should be the largest, see *NB*)
        *ACTUAL_RESULT:* attr4 column's values were sorted in different ways: 
for 'obj1'/'obj3' NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.

*NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() 
methods) treats NaN as the largest value

4. Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONALITY:* max(column_name), min(column_name),

*ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if 
the selected column contains NaN value.
{code}SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by 
name{code}
Result
{code}
name    Min     Max
obj1    NaN     NaN
obj2    NaN     NaN
obj3    NaN     NaN
obj4    NaN     NaN
{code}

As for me, this logic should be revised, current behavior is a bit confusing:
- Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} will 
return NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's min/max 
logic can be adjusted to the Postgres' one, see postgres.jpg
- Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs


> NaN, Infinity issues
> --------------------
>
>                 Key: DRILL-6154
>                 URL: https://issues.apache.org/jira/browse/DRILL-6154
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Volodymyr Tkach
>            Assignee: Volodymyr Tkach
>            Priority: Major
>             Fix For: 1.13.0
>
>
> 1.Issue
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONS:*
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
> added: *NaN, Infinity, -Infinity*. The new data types must be processed 
> properly by existing functions. There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, 
> sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
> Infinity, NaN_ (expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs 
> ([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x 
> to y decimal places. *Specifying y is optional. Default is 1*_. So, the 
> function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
> trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, 
> NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
> Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NumberFormatException Fragment 0:0 [Error Id: 
> 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: 
> *NaN, Infinity, -Infinity*. 
>  During testing activities, it was detected a bit strange behavior of INNER 
> JOIN operator - different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
>       name         name0
>       object2         object2
>       object2         object3
>       object2         object4
>       object3         object2
>       object3         object3
>       object3         object4
>       object4         object2
>       object4         object3
>       object4         object4
>       {code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
>  *EXPECTED_RESULT:*
> {code:java}
>       name
>       object2
>       object3
>       object4
>       {code}
> *ACTUAL_RESULT:(*
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which 
> contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is 
> not reproducible is a user is JOINing tables by a column containing other 
> data types
> 3. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
> *THIS ISSUE REFERS TO:DRILL-5919*
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
> sorting columns containing NaN values. In one case it considers NaN to be the 
> largest value, in another - the smallest one. 
>  *Steps:*
>  - Select from the attached test file (orderBy.json, attached)
> {code:java}
> SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, attr4{code}
>  - Check the attached screen shot (orderByIssue.jpg):
>  *EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 
> columns data in the same way (most probably NaN should be the largest, see 
> *NB*)
>  *ACTUAL_RESULT:* attr4 column's values were sorted in different ways: for 
> 'obj1'/'obj3' NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.
> *NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() 
> methods) treats NaN as the largest value
> 4. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* max(column_name), min(column_name),
> *ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if 
> the selected column contains NaN value.
> {code:java}
> SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by 
> name{code}
> Result
> {code:java}
> name  Min     Max
> obj1  NaN     NaN
> obj2  NaN     NaN
> obj3  NaN     NaN
> obj4  NaN     NaN
> {code}
> As for me, this logic should be revised, current behavior is a bit confusing:
>  - Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} 
> will return NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's 
> min/max logic can be adjusted to the Postgres' one, see postgres.jpg
>  - Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6154) NaN, Infinity issues

Reply via email to