[ 
https://issues.apache.org/jira/browse/DRILL-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203235#comment-15203235
 ] 

Shankar commented on DRILL-4520:
--------------------------------

I tried this one too but getting similar issues. 

> Error parsing JSON ( a column with different datatypes )
> --------------------------------------------------------
>
>                 Key: DRILL-4520
>                 URL: https://issues.apache.org/jira/browse/DRILL-4520
>             Project: Apache Drill
>          Issue Type: Test
>            Reporter: Shankar
>
> I am stuck in the middle of somewhere. Could you please help me to resolve 
> below error.
> I am running query on drill 1.6.0 in cluster on logs json data (150GB size of 
> log file) ( 1 json / line).
> {quote}
> solution as per my opinion - 
> 1. Either drill should able to ignore those lines(ANY data type) while 
> reading or creating the table (CTAS).
> 2. Or Data will get stored as it is with ANY data type if any fields in data 
> differs in their data types. This will be useful in the case where other 
> columns (excluding ANY data type columns) carrying important informations.
> {quote}
> h4. -------------- test.json --------------
> Abount Data : 
> 1. I have just extract 3 lines from logs for test purpose.
> 2. In data field called "ajaxUrl" is differ in datatype. Sometimes it 
> contains string and sometime array of jsons and null as well. 
> 3. Here in our case - Some events in 150 gb json file are like this where 
> they differ in structure. I could say there are only 0.1% (per 150gb json 
> file) are such events.
> {noformat}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> {"gameId":"https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id":56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot
>  Qualifier 
> 1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s":5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":"100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> {noformat}
> h4. -------------- Select Query  (ERROR) --------------
> {noformat}
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are 
> using a ValueWriter of type NullableVarCharWriterImpl.
> File  /tmp/test.json
> Record  2
> Fragment 0:0
> {color}
> h4. -------------- Select Query (works Fine with UNION type) --------------
> Tried UNION type (an experimental feature)
> set `exec.enable_union_type` = true;
> {noformat}
> set `exec.enable_union_type` = true;
> +-------+----------------------------------+
> |  ok   |             summary              |
> +-------+----------------------------------+
> | true  | exec.enable_union_type updated.  |
> +-------+----------------------------------+
> 1 row selected (0.193 seconds)
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> |   timestamp    |           sessionid            |                           
>                  gameid                                            |        
> ajaxUrl        | ajaxData  |
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null                      
>                                                                    | 
> /player/updatebonus1  | null      |
> | 1457771458873  | D18104E8CA3071C7A8F4E141B127   | 
> https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043
>   | []                    | null      |
> | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null                      
>                                                                    | 
> /player/updatebonus2  | null      |
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> 3 rows selected (0.965 seconds)
> {noformat}
> h4. -------------- CTAS Query (ERROR) --------------
> {noformat}
> set `exec.enable_union_type` = true;
> +-------+----------------------------------+
> |  ok   |             summary              |
> +-------+----------------------------------+
> | true  | exec.enable_union_type updated.  |
> +-------+----------------------------------+
> 1 row selected (0.193 seconds)
> create table dfs.tmp.test1 AS 
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> {color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to