[
https://issues.apache.org/jira/browse/DRILL-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216005#comment-16216005
]
David Lee commented on DRILL-5769:
----------------------------------
I finally debugged which json key value was causing my IOBE exception by
trimming my json file down to
349 megs with around 10,200 json records. Unfortunately I couldn't generate a
smaller test file which duplicates the problem.
349939049 Oct 23 16:50 test.json
The JSON file contains multiple nested DealingSchedule objects like:
"DealingSchedule": {"DealingTime": {"CutOffTimeDetail":
[{"CutOffTimeDetailTimeZone": "1", "CutOffTimeDetail_CountryId": "CU$$$$$AUS",
"CutOffTime": "15:00", "DealingType": "3"}]}}
"DealingSchedule": {"DealingTime": {"CutOffTimeDetail":
[{"CutOffTimeDetailTimeZone": "1", "CutOffTimeDetail_CountryId": "CU$$$$$AUS",
"CutOffTime": "11:00", "DealingType": "3"}]}}
"DealingSchedule": {"ValuationTimeTimeZone": "1", "ValuationTime_CountryId":
"CU$$$$$AUS", "ValuationTime": "12:00"}
"DealingSchedule": {"ValuationTimeTimeZone": "3", "ValuationTime_CountryId":
"CU$$$$$CAN", "ValuationTime": "16:00","DealingTime": {"CutOffTimeDetail":
[{"CutOffTimeDetailTimeZone": "3", "CutOffTimeDetail_CountryId": "CU$$$$$CAN",
"CutOffTime": "16:00", "DealingType": "3"}]}}
Near the end of the file this flavor of DealingSchedule appears which contains
a DealingTimeDetail array of one record. This is the first time this
DealingSchedule.DealingTime.DealingTimeDetail key appears in the JSON file:
"DealingSchedule": {"ValuationTime_CountryId": "CU$$$$$GBR", "ValuationTime":
"08:00", "DealingTime": {"DealingTimeDetail": [{"DealingTimeDetail_CountryId":
"CU$$$$$GBR", "StartTime": "09:00", "EndTime": "17:00"}]}}
It produces the following error:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))
Fragment 0:0 [Error Id: 4d7e60fb-b7d0-49bd-9cf7-244dc4d7341d on ...
1. If I remove the array[] brackets and turn it into keys it works:
[{"DealingTimeDetail_CountryId": "CU$$$$$GBR", "StartTime": "09:00", "EndTime":
"17:00"}]
to
{"DealingTimeDetail_CountryId": "CU$$$$$GBR", "StartTime": "09:00", "EndTime":
"17:00"}
2. I tried creating a smaller JSON file with just DealingSchedule objects, but
it would read the file without errors.
3. If add extra records to the array it also returns an IOBE.
{"DealingTimeDetail": [
{"DealingTimeDetail_CountryId": "CU$$$$$GBR", "StartTime": "09:00", "EndTime":
"17:00"},
{"DealingTimeDetail_CountryId": "CU$$$$$GBR", "StartTime": "09:00", "EndTime":
"17:00"}
]}
> IndexOutOfBoundsException when querying JSON files
> --------------------------------------------------
>
> Key: DRILL-5769
> URL: https://issues.apache.org/jira/browse/DRILL-5769
> Project: Apache Drill
> Issue Type: Bug
> Components: Server, Storage - JSON
> Affects Versions: 1.10.0
> Environment: *jdk_8u45_x64*
> *single drillbit running on zookeeper*
> *Following options set to TRUE:*
> drill.exec.functions.cast_empty_string_to_null
> store.json.all_text_mode
> store.parquet.enable_dictionary_encoding
> store.parquet.use_new_reader
> Reporter: David Lee
> Assignee: Jinfeng Ni
> Fix For: 1.10.0, 1.11.0, 1.12.0
>
> Attachments: 001.json, 100.json, 111.json
>
>
> *Running the following SQL on these three JSON files fail: *
> 001.json 100.json 111.json
> select t.id
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Error:*
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IndexOutOfBoundsException: index: 1024, length: 1 (expected: range(0, 1024))
> Fragment 0:0 [Error Id: xxxx.xxxx...
> *However running the same SQL on two out of three files works:*
> select t.id
> from dfs.`/tmp/1??.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/?1?.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/??1.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Changing the selected column from t.id to t.* also works: *
> select *
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)