[
https://issues.apache.org/jira/browse/DRILL-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203235#comment-15203235
]
Shankar commented on DRILL-4520:
--------------------------------
I tried this one too but getting similar issues.
> Error parsing JSON ( a column with different datatypes )
> --------------------------------------------------------
>
> Key: DRILL-4520
> URL: https://issues.apache.org/jira/browse/DRILL-4520
> Project: Apache Drill
> Issue Type: Test
> Reporter: Shankar
>
> I am stuck in the middle of somewhere. Could you please help me to resolve
> below error.
> I am running query on drill 1.6.0 in cluster on logs json data (150GB size of
> log file) ( 1 json / line).
> {quote}
> solution as per my opinion -
> 1. Either drill should able to ignore those lines(ANY data type) while
> reading or creating the table (CTAS).
> 2. Or Data will get stored as it is with ANY data type if any fields in data
> differs in their data types. This will be useful in the case where other
> columns (excluding ANY data type columns) carrying important informations.
> {quote}
> h4. -------------- test.json --------------
> Abount Data :
> 1. I have just extract 3 lines from logs for test purpose.
> 2. In data field called "ajaxUrl" is differ in datatype. Sometimes it
> contains string and sometime array of jsons and null as well.
> 3. Here in our case - Some events in 150 gb json file are like this where
> they differ in structure. I could say there are only 0.1% (per 150gb json
> file) are such events.
> {noformat}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> {"gameId":"https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id":56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot
> Qualifier
> 1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s":5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":"100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> {noformat}
> h4. -------------- Select Query (ERROR) --------------
> {noformat}
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are
> using a ValueWriter of type NullableVarCharWriterImpl.
> File /tmp/test.json
> Record 2
> Fragment 0:0
> {color}
> h4. -------------- Select Query (works Fine with UNION type) --------------
> Tried UNION type (an experimental feature)
> set `exec.enable_union_type` = true;
> {noformat}
> set `exec.enable_union_type` = true;
> +-------+----------------------------------+
> | ok | summary |
> +-------+----------------------------------+
> | true | exec.enable_union_type updated. |
> +-------+----------------------------------+
> 1 row selected (0.193 seconds)
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> | timestamp | sessionid |
> gameid |
> ajaxUrl | ajaxData |
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> | 1457658600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null
> |
> /player/updatebonus1 | null |
> | 1457771458873 | D18104E8CA3071C7A8F4E141B127 |
> https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043
> | [] | null |
> | 1457958600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null
> |
> /player/updatebonus2 | null |
> +----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> 3 rows selected (0.965 seconds)
> {noformat}
> h4. -------------- CTAS Query (ERROR) --------------
> {noformat}
> set `exec.enable_union_type` = true;
> +-------+----------------------------------+
> | ok | summary |
> +-------+----------------------------------+
> | true | exec.enable_union_type updated. |
> +-------+----------------------------------+
> 1 row selected (0.193 seconds)
> create table dfs.tmp.test1 AS
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> {color}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)