Abhishek Girish created DRILL-5105:
--------------------------------------

             Summary: Query time increases exponentially with increasing nested 
levels
                 Key: DRILL-5105
                 URL: https://issues.apache.org/jira/browse/DRILL-5105
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON
    Affects Versions: 1.9.0
         Environment: 3 Node Cluster with default memory and configurations. 
            Reporter: Abhishek Girish


The time taken to query any JSON dataset depends on number of nested levels 
within the dataset. Also, increasing the complexity of the dataset further 
impacts the execution time. 

Tabulated below is cached query execution times for a simple select * query 
over two simple forms of JSON datasets: 

|| # Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
|1                 |0.22                          |0.27                         
 |
|2                 |0.23                             |0.25                      
    |
|4                 |0.24                             |0.22                      
    |
|8                 |0.22                             |0.23                      
    |
|16                |0.34                             |0.48                      
    |
|24                |25.76                            |72.51                     
   |
|26                |103.48                           |289.6                     
   |
|28                |336.12                           |1151.94                   
 |
|30                |1342.22                  |4611.19                    |
|32                |5360.2                           |Expected: ~20k        |

The above table lists query times for 20 different JSON files, 10 belonging to 
dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number of 
nested levels within them vary as mentioned in the "# Levels" column. 

It appears that the query time almost doubles with addition of a nested level 
(note that in the table above, it translates to almost 4x across said levels) 

The below two are the representative datasets, showcasing simple JSON 
structures with nested levels.

Structure of Dataset 1:
{code}
{
  "level1": {
    "field1": "a",
    "level2": {
      "field1"": "b",
      ...
    }
  }
}
{code}

Structure of Dataset 2:
{code}
"{
  "level1": {
    "field1": ""a",
    "field2": {
      "nfield1": true,
      "nfield2": 1.1
    },
    "level2": {
      "field1": "b",
      "field2": {
        "nfield1": false,
        "nfield2": 2.2
      },
      ...
    }
  }
}
{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to