[
https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chunhui Shi reassigned DRILL-5105:
----------------------------------
Assignee: Chunhui Shi
> Query time increases exponentially with increasing nested levels
> ----------------------------------------------------------------
>
> Key: DRILL-5105
> URL: https://issues.apache.org/jira/browse/DRILL-5105
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.9.0
> Environment: 3 Node Cluster with default memory and configurations.
> Reporter: Abhishek Girish
> Assignee: Chunhui Shi
>
> The time taken to query any JSON dataset depends on number of nested levels
> within the dataset. Also, increasing the complexity of the dataset further
> impacts the execution time.
> Tabulated below is cached query execution times for a simple select * query
> over two simple forms of JSON datasets:
> || No. Levels || Time (s) Dataset 1 || Time (s) Dataset 2 ||
> |1 |0.22 |0.27
> |
> |2 |0.23 |0.25
> |
> |4 |0.24 |0.22
> |
> |8 |0.22 |0.23
> |
> |16 |0.34 |0.48
> |
> |24 |25.76 |72.51
> |
> |26 |103.48 |289.6
> |
> |28 |336.12 |1151.94
> |
> |30 |1342.22 |4586.79 |
> |32 |5360.2 |Expected: ~20k |
> The above table lists query times for 20 different JSON files, 10 belonging
> to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number
> of nested levels within them vary as mentioned in the "No. Levels" column.
> It appears that the query time almost doubles with addition of a nested level
> (note that in the table above, it translates to almost 4x across levels
> starting 24)
> The below two are the representative datasets, showcasing simple JSON
> structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
> "level1": {
> "field1": "a",
> "level2": {
> "field1"": "b",
> ...
> }
> }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
> "level1": {
> "field1": ""a",
> "field2": {
> "nfield1": true,
> "nfield2": 1.1
> },
> "level2": {
> "field1": "b",
> "field2": {
> "nfield1": false,
> "nfield2": 2.2
> },
> ...
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)