Hanumath Rao Maduri created DRILL-7113:
------------------------------------------
Summary: Issue with filtering null values from MapRDB-JSON
Key: DRILL-7113
URL: https://issues.apache.org/jira/browse/DRILL-7113
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Aman Sinha
Fix For: 1.16.0, 1.17.0
When the Drill is querying documents from MapRDBJSON that contain fields with
null value, it returns the wrong result.
The issue is locally reproduced.
Please find the repro steps:
[1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
[2] Insert the following sample records to table:
{code:java}
insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person",
"confidence": 0.24}'
insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3",
"confidence": 0.54}'
insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4",
"confidence": null}'
{code}
We can see that for field 'confidence' document 1 has value 0.24, document 3
has value 0.54, document 2 does not have the field and document 4 has the field
with value null.
[3] Query the table from DRILL.
*Query 1:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person | 0.24 |
| person2 | null |
| person3 | 0.54 |
| person4 | null |
+----------+-------------+
4 rows selected (0.2 seconds)
{code}
*Query 2:*
{code:java}
0: jdbc:drill:> select * from dfs.tmp.dmdb2;
+------+-------------+----------+
| _id | confidence | label |
+------+-------------+----------+
| 1 | 0.24 | person |
| 2 | null | person2 |
| 3 | 0.54 | person3 |
| 4 | null | person4 |
+------+-------------+----------+
4 rows selected (0.174 seconds)
{code}
*Query 3:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is
not null;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person | 0.24 |
| person3 | 0.54 |
| person4 | null |
+----------+-------------+
3 rows selected (0.192 seconds)
{code}
*Query 4:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is
null;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person2 | null |
+----------+-------------+
1 row selected (0.262 seconds)
{code}
As you can see, Query 3 which queries for all documents with confidence value
'is not null', returns a document with null value.
*Other observation:*
Querying the same data using DRILL without MapRDB provides the correct result.
For example, create 4 different JSON files with following data:
{"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label":
"person3", "confidence": 0.54} \{"label": "person4", "confidence": null}
Query it directly using DRILL:
*Query 5:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person4 | null |
| person3 | 0.54 |
| person2 | null |
| person | 0.24 |
+----------+-------------+
4 rows selected (0.203 seconds)
{code}
*Query 6:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is
null;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person4 | null |
| person2 | null |
+----------+-------------+
2 rows selected (0.352 seconds)
{code}
*Query 7:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not
null;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person3 | 0.54 |
| person | 0.24 |
+----------+-------------+
2 rows selected (0.265 seconds)
{code}
As seen in query 6 & 7, it returns the correct result.
I believe the issue is at the MapRDB layer where it is fetching the results.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)