[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628664#comment-15628664
 ] 

Khurram Faraaz commented on DRILL-4653:
---------------------------------------

[~kkhatua] I tried these tests with malformed JSON, on Drill 1.9.0 git commit 
ID : 83513daf
[~ssriniva123] Is this the expected behavior ?

[test@cent01 drill_4653]# cat badjson_01.json
{"key":"test string"}
{"key":"foo"}
{"key":"foobar"
{"key":"blah"}
{"key":"temp"}

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badjson_01.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 
123)): was expecting comma to separate OBJECT entries

File  /tmp/badjson_01.json
Record  3
Column  2
Fragment 0:0

[Error Id: 76e0cc69-229b-40b7-93fd-9ca9f6a22473 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> select key from `badjson_01.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 
123)): was expecting comma to separate OBJECT entries

File  /tmp/badjson_01.json
Record  3
Column  2
Fragment 0:0

[Error Id: 9918e669-1638-44f8-a4e1-ffa33b5ef830 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

case (2)

[test@cent01 drill_4653]# cat badjson_02.json
{
    "key":"foo",
    "badarray":[1,3,4,5,6,7,8,,
    "key":"test string",
    "key":"foobar"
}
[test@cent01 drill_4653]#

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badjson_02.json`;
Error: DATA_READ ERROR: Unexpected character (',' (code 44)): expected a valid 
value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: org.apache.drill.exec.store.dfs.DrillFSDataInputStream@2380dfc9; 
line: 3, column: 32]

Line  3
Column  33
Field  badarray
Fragment 0:0

[Error Id: a25159c7-7770-4a1d-870c-dd479dd01a7d on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> select key from `badjson_02.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character (',' (code 
44)): expected a valid value (number, String, array, object, 'true', 'false' or 
'null')

File  /tmp/badjson_02.json
Record  1
Column  32
Fragment 0:0

[Error Id: 2ae443eb-7fc2-4648-b9ea-1742d23932ae on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

case (3)

[test@cent01 drill_4653]# cat badjson_03.json
{
    "key":"foo",
    "key":"foobar",
    "key":"test string",
    "key":"string",
    "key":
}
[test@cent01 drill_4653]#

{noformat}
0: jdbc:drill:schema=dfs.tmp> select key from `badjson_03.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 
125)): expected a value

File  /tmp/badjson_03.json
Record  1
Column  2
Fragment 0:0

[Error Id: 39d94490-5186-46d9-9631-94ec32d3094e on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> select key from `badjson_03.json` where key 
='foobar';
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 
125)): expected a value

File  /tmp/badjson_03.json
Record  1
Column  2
Fragment 0:0

[Error Id: b20cd289-2c7b-41ec-b18c-6941205c4d1d on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

case (4)

[test@cent01 drill_4653]# cat badjson_04.json
{"key":1}
{"key":2}
{"key":3}
{"key":
[test@cent01 drill_4653]

{noformat}
0: jdbc:drill:schema=dfs.tmp> select key from `badjson_04.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input 
within/between OBJECT entries

File  /tmp/badjson_04.json
Record  4
Column  39
Fragment 0:0

[Error Id: 9cd4d9a8-5871-4eaa-a68e-c6eab3bf2e41 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

case (5)

[test@cent01 drill_4653]# cat badjson_05.json
{
    "key1":"foobar",
    "key2":[1,3,4,5,6,7,8,9],
    "key3":{ "key4":},
    "key5":"foo"
}
[test@cent01 drill_4653]

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badjson_05.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 
125)): expected a value

File  /tmp/badjson_05.json
Record  1
Column  22
Fragment 0:0

[Error Id: 18be71b9-bb58-4cd5-9e74-3c19ab282cfd on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

case (6)

[test@cent01 drill_4653]# cat badjson_06.json
{
    "name":"John Doe",
    "age":33,
    "dept":"IT",
    "address":{
                  "street":"some street",
                  "city":"some city",
                  "zip":
              }
    "isManager":"yes"
}
[test@cent01 drill_4653]

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badjson_06.json`;
Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 
125)): expected a value

File  /tmp/badjson_06.json
Record  1
Column  16
Fragment 0:0

[Error Id: bf83ea0e-d708-4c3b-b50c-6923fc17c6b6 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}


> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>
>                 Key: DRILL-4653
>                 URL: https://issues.apache.org/jira/browse/DRILL-4653
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.6.0
>            Reporter: subbu srinivasan
>             Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to