Hi,
Querying nested data from a JSON file with and with out setting store.json.all_text_mode results in errors. Data that was used in the test is available here - https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5 There is an option (on the top right) to Export as JSON. Drill commit ID : 0686bc23 (on master) [root@centos-01 ~]# hadoop fs -ls /tmp/rows.json -rwxr-xr-x 3 root root 156417523 2015-09-10 17:18 /tmp/rows.json [root@centos-01 food_inspection_data]# hadoop fs -put rows.json /tmp [root@centos-01 food_inspection_data]# cd /opt/mapr/drill/drill-1.2.0/bin [root@centos-01 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp" apache drill 1.0.0 "a little sql for your nosql" Query 1) 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`; Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Line 1023 Column 29 Field smallest Fragment 0:0 [Error Id: 0844933a-ed61-4137-9970-203576d5770d on centos-02.qa.lab:31010] (state=,code=0) *-----* Query 2) Drill seems to be checking for correctness of data much deeper in the nesting. Note that I do a count over field* meta, *and field* "smallest" *reported in error is much deeper inside* meta, * meta.view.columns.cachedContents.smallest 0: jdbc:drill:schema=dfs.tmp> select count(meta) from `rows.json`; Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Line 1023 Column 29 Field smallest Fragment 0:0 [Error Id: 66e7f461-6671-4862-bde5-685c8c2304c9 on centos-04.qa.lab:31010] (state=,code=0) Explain plan for Query (2) 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(meta) from `rows.json`; *+------+------+* *| **text** | **json** |* *+------+------+* *| *00-00 Screen 00-01 Project(EXPR$0=[$0]) 00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) 00-03 StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) 00-04 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/rows.json, numFiles=1, columns=[`meta`], files=[maprfs:///tmp/rows.json]]]) Query 3) 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t; Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Line 1023 Column 29 Field smallest Fragment 0:0 [Error Id: 55ea650c-9587-42d8-aec9-0f5a4b77ee4c on centos-04.qa.lab:31010] (state=,code=0) Query 4) - The count returned is wrong! 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json` t; *+---------+* *| **EXPR$0 ** |* *+---------+* *| *1 * |* *+---------+* 1 row selected (1.57 seconds) ------------------- The below queries were run after json all text mode configuration was enabled. 0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true; *+-------+------------------------------------+* *| ** ok ** | ** summary ** |* *+-------+------------------------------------+* *| *true * | *store.json.all_text_mode updated. * |* *+-------+------------------------------------+* 1 row selected (0.218 seconds) 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t; Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Fragment 0:0 [Error Id: 5eeee0f3-69fd-445c-a149-a3aa2f18d1cb on centos-04.qa.lab:31010] (state=,code=0) COUNT returned in below query is wrong. 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json` t; +---------+ | EXPR$0 | +---------+ | 1 | +---------+ 1 row selected (0.948 seconds) Query to perform count over nested data. 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.columns.cachedContents.smallest) from `rows.json` t; Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Fragment 0:0 [Error Id: 4ce0d5fe-abd7-47d9-8106-269fe37409e6 on centos-04.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> select t.meta.view.columns.cachedContents.smallest from `rows.json` t; Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Fragment 0:0 [Error Id: b2514414-af51-46c5-938d-ff792e6a8a8c on centos-04.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`; Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/rows.json Record 1 Fragment 0:0 [Error Id: 3615e088-64cc-4cba-8cfe-602563f4e16c on centos-04.qa.lab:31010] (state=,code=0) These are the occorrences of field named *smallest* in the JSON file Administrators-MacBook-Pro:Downloads kfaraaz$ grep "smallest" rows.json "smallest" : "44247", "smallest" : "'C' A CATERING AND EVENT COMPANY", "smallest" : "'C' A CATERING AND EVENT COMPANY", "smallest" : "0", "smallest" : "(convenience store)", "smallest" : "All", "smallest" : " ", "smallest" : "alsip", "smallest" : "IL", "smallest" : "60007", "smallest" : "2010-01-04T00:00:00", "smallest" : "1315 license reinspection", "smallest" : "Business Not Located", "smallest" : "1. SOURCE SOUND CONDITION, NO SPOILAGE, FOODS PROPERLY LABELED, SHELLFISH TAGS IN PLACE - Comments: ", "smallest" : "41.64467013219805", "smallest" : "-87.91442843927047", "smallest" : { , [ 101045, "180563EF-F00C-41F9-A0C2-7B2100AD2DED", 101045, 1441883778, "386464", 1441883778, "386464", null, "343208", "SUBWAY # 26817-309 W. MONROE", "SUBWAY", "1519940", "Restaurant", "Risk 1 (High)", "309 W MONROE ST ", "CHICAGO", "IL", "60606", "2010-07-13T00:00:00", "Canvass", "Pass", "32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED - Comments: All food and non-food contact equipment and utensils shall be smooth, easily cleanable, and durable, and shall be in good repair. A few of the smallest cutting boards, with deep/dark grooves must be sanded/bleached- to make smooth/easily cleanable. | 33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS - Comments: All food and non-food contact surfaces of equipment and all food storage utensils shall be thoroughly cleaned and sanitized daily. Interior of ice chute at soda dispensers need cleaning. | 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: The floors shall be constructed per code, be smooth and easily cleaned, and be kept clean and in good repair. Floors of walk-in cooler (around corners) need cleaning; baseboard at the outside base of the walk-in cooler door is in poor repair, must fix.", "41.88052826494154", "-87.63556938091433", [ null, "41.88052826494154", "-87.63556938091433", null, false ] ] Thanks, Khurram
