[ 
https://issues.apache.org/jira/browse/DRILL-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157935#comment-16157935
 ] 

David Lee commented on DRILL-5769:
----------------------------------

My original problem was encountered when running a select * on a single 5 gig 
JSON file with a mix of nested keys and arrays to convert it to parquet.. 
Splitting that file into 10 smaller files worked to create 10 parquet files but 
then the same technique failed on a different 6 gig JSON since I have no 
control when an empty array may show up.

> IndexOutOfBoundsException when querying JSON files
> --------------------------------------------------
>
>                 Key: DRILL-5769
>                 URL: https://issues.apache.org/jira/browse/DRILL-5769
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server, Storage - JSON
>    Affects Versions: 1.10.0
>         Environment: *jdk_8u45_x64*
> *single drillbit running on zookeeper*
> *Following options set to TRUE:*
> drill.exec.functions.cast_empty_string_to_null
> store.json.all_text_mode
> store.parquet.enable_dictionary_encoding
> store.parquet.use_new_reader
>            Reporter: David Lee
>            Assignee: Jinfeng Ni
>             Fix For: 1.10.0, 1.11.0, 1.12.0
>
>         Attachments: 001.json, 100.json, 111.json
>
>
> *Running the following SQL on these three JSON files fail: *
> 001.json 100.json 111.json
> select t.id
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Error:*
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IndexOutOfBoundsException: index: 1024, length: 1 (expected: range(0, 1024)) 
> Fragment 0:0 [Error Id: xxxx.xxxx...
> *However running the same SQL on two out of three files works:*
> select t.id
> from dfs.`/tmp/1??.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/?1?.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/??1.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Changing the selected column from t.id to t.* also works: *
> select *
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to