[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628664#comment-15628664 ]
Khurram Faraaz commented on DRILL-4653: --------------------------------------- [~kkhatua] I tried these tests with malformed JSON, on Drill 1.9.0 git commit ID : 83513daf [~ssriniva123] Is this the expected behavior ? [test@cent01 drill_4653]# cat badjson_01.json {"key":"test string"} {"key":"foo"} {"key":"foobar" {"key":"blah"} {"key":"temp"} {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `badjson_01.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 123)): was expecting comma to separate OBJECT entries File /tmp/badjson_01.json Record 3 Column 2 Fragment 0:0 [Error Id: 76e0cc69-229b-40b7-93fd-9ca9f6a22473 on centos-01.qa.lab:31010] (state=,code=0) {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> select key from `badjson_01.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('{' (code 123)): was expecting comma to separate OBJECT entries File /tmp/badjson_01.json Record 3 Column 2 Fragment 0:0 [Error Id: 9918e669-1638-44f8-a4e1-ffa33b5ef830 on centos-01.qa.lab:31010] (state=,code=0) {noformat} case (2) [test@cent01 drill_4653]# cat badjson_02.json { "key":"foo", "badarray":[1,3,4,5,6,7,8,, "key":"test string", "key":"foobar" } [test@cent01 drill_4653]# {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `badjson_02.json`; Error: DATA_READ ERROR: Unexpected character (',' (code 44)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: org.apache.drill.exec.store.dfs.DrillFSDataInputStream@2380dfc9; line: 3, column: 32] Line 3 Column 33 Field badarray Fragment 0:0 [Error Id: a25159c7-7770-4a1d-870c-dd479dd01a7d on centos-01.qa.lab:31010] (state=,code=0) {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> select key from `badjson_02.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character (',' (code 44)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') File /tmp/badjson_02.json Record 1 Column 32 Fragment 0:0 [Error Id: 2ae443eb-7fc2-4648-b9ea-1742d23932ae on centos-01.qa.lab:31010] (state=,code=0) {noformat} case (3) [test@cent01 drill_4653]# cat badjson_03.json { "key":"foo", "key":"foobar", "key":"test string", "key":"string", "key": } [test@cent01 drill_4653]# {noformat} 0: jdbc:drill:schema=dfs.tmp> select key from `badjson_03.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 125)): expected a value File /tmp/badjson_03.json Record 1 Column 2 Fragment 0:0 [Error Id: 39d94490-5186-46d9-9631-94ec32d3094e on centos-01.qa.lab:31010] (state=,code=0) {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> select key from `badjson_03.json` where key ='foobar'; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 125)): expected a value File /tmp/badjson_03.json Record 1 Column 2 Fragment 0:0 [Error Id: b20cd289-2c7b-41ec-b18c-6941205c4d1d on centos-01.qa.lab:31010] (state=,code=0) {noformat} case (4) [test@cent01 drill_4653]# cat badjson_04.json {"key":1} {"key":2} {"key":3} {"key": [test@cent01 drill_4653] {noformat} 0: jdbc:drill:schema=dfs.tmp> select key from `badjson_04.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input within/between OBJECT entries File /tmp/badjson_04.json Record 4 Column 39 Fragment 0:0 [Error Id: 9cd4d9a8-5871-4eaa-a68e-c6eab3bf2e41 on centos-01.qa.lab:31010] (state=,code=0) {noformat} case (5) [test@cent01 drill_4653]# cat badjson_05.json { "key1":"foobar", "key2":[1,3,4,5,6,7,8,9], "key3":{ "key4":}, "key5":"foo" } [test@cent01 drill_4653] {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `badjson_05.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 125)): expected a value File /tmp/badjson_05.json Record 1 Column 22 Fragment 0:0 [Error Id: 18be71b9-bb58-4cd5-9e74-3c19ab282cfd on centos-01.qa.lab:31010] (state=,code=0) {noformat} case (6) [test@cent01 drill_4653]# cat badjson_06.json { "name":"John Doe", "age":33, "dept":"IT", "address":{ "street":"some street", "city":"some city", "zip": } "isManager":"yes" } [test@cent01 drill_4653] {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `badjson_06.json`; Error: DATA_READ ERROR: Error parsing JSON - Unexpected character ('}' (code 125)): expected a value File /tmp/badjson_06.json Record 1 Column 16 Fragment 0:0 [Error Id: bf83ea0e-d708-4c3b-b50c-6923fc17c6b6 on centos-01.qa.lab:31010] (state=,code=0) {noformat} > Malformed JSON should not stop the entire query from progressing > ---------------------------------------------------------------- > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.6.0 > Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)