Shankar created DRILL-4520:
------------------------------

             Summary: Error parsing JSON ( a column with different datatypes )
                 Key: DRILL-4520
                 URL: https://issues.apache.org/jira/browse/DRILL-4520
             Project: Apache Drill
          Issue Type: Test
            Reporter: Shankar


I am stuck in the middle of somewhere. Could you please help me to resolve 
below error.
I am running query on drill 1.6.0 in cluster on logs json data (150GB size of 
log file) ( 1 json / line).

{quote}
solution as per my opinion - 
1. Either drill should able to ignore those lines(ANY data type) while reading 
or creating the table (CTAS).
2. Or Data will get stored as it is with ANY data type if any fields in data 
differs in their data types. This will be useful in the case where other 
columns (excluding ANY data type columns) carrying important informations.
{quote}


h4. -------------- test.json --------------

Abount Data : 
1. I have just extract 3 lines from logs for test purpose.
2. In data field called "ajaxUrl" is differ in datatype. Sometimes it contains 
string and sometime array of jsons and null as well. 
3. Here in our case - Some events in 150 gb json file are like this where they 
differ in structure. I could say there are only 0.1% (per 150gb json file) are 
such events.


{noformat}
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
{"gameId":"https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id":56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot
 Qualifier 
1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s":5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":"100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
{noformat}


h4. -------------- Select Query  (ERROR) --------------

{noformat}
select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;
{noformat}

{color:red}
Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are 
using a ValueWriter of type NullableVarCharWriterImpl.

File  /tmp/test.json
Record  2
Fragment 0:0
{color}


h4. -------------- Select Query (works Fine with UNION type) --------------
Tried UNION type (an experimental feature)
set `exec.enable_union_type` = true;

{noformat}

set `exec.enable_union_type` = true;
+-------+----------------------------------+
|  ok   |             summary              |
+-------+----------------------------------+
| true  | exec.enable_union_type updated.  |
+-------+----------------------------------+
1 row selected (0.193 seconds)



select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;

+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
|   timestamp    |           sessionid            |                             
               gameid                                            |        
ajaxUrl        | ajaxData  |
+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
| 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null                        
                                                                 | 
/player/updatebonus1  | null      |
| 1457771458873  | D18104E8CA3071C7A8F4E141B127   | 
https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043
  | []                    | null      |
| 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null                        
                                                                 | 
/player/updatebonus2  | null      |
+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
3 rows selected (0.965 seconds)

{noformat}



h4. -------------- CTAS Query (ERROR) --------------


{noformat}

set `exec.enable_union_type` = true;
+-------+----------------------------------+
|  ok   |             summary              |
+-------+----------------------------------+
| true  | exec.enable_union_type updated.  |
+-------+----------------------------------+
1 row selected (0.193 seconds)


create table dfs.tmp.test1 AS 
select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;
{noformat}


{color:red}
Error: SYSTEM ERROR: NullPointerException

Fragment 0:0
{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to