[jira] [Created] (DRILL-1716) Nested Data : There should be an easy way to apply aggregate functions on repeated types

Rahul Challapalli (JIRA) Fri, 14 Nov 2014 13:09:00 -0800

Rahul Challapalli created DRILL-1716:
----------------------------------------


             Summary: Nested Data : There should be an easy way to apply 
aggregate functions on repeated types
                 Key: DRILL-1716
                 URL: https://issues.apache.org/jira/browse/DRILL-1716
             Project: Apache Drill
          Issue Type: New Feature
          Components: Functions - Drill, Storage - JSON
            Reporter: Rahul Challapalli


{code}
{
    "company_id": 1,
    "evnts": [
        {
            "evnt_id": 999,
            "evnt_duration": 60
        },
        {
            "evnt_id": 998,
            "evnt_duration": 30
        },
        {
            "evnt_id": 997,
            "evnt_duration": 45
        }
    ]
}
{code}

For the above dataset, if I want to find the longest duration for each company 
id, below is how I would do it now

{code}
select sub.company_id , max(sub.evnt.evnt_duration) max_duration 
from ( 
    select company_id, flatten(evnts) evnt from `nested.json`
) sub 
group by sub.company_id;

+------------+--------------+
| company_id | max_duration |
+------------+--------------+
| 1          | 60           |
+------------+--------------+
{code}

Now if I want the evnt_id associated with the longest duration then we need one 
more join
{code}
select a.company_id, b.evnt.evnt_id 
from (
    select sub.company_id company_id, max(sub.evnt.evnt_duration) max_duration 
    from ( 
        select company_id, flatten(evnts) evnt from `nested.json`
    ) sub 
    group by sub.company_id
) a 
join 
(
    select flatten(evnts) evnt from `nested.json`
) b 
on a.max_duration = b.evnt.evnt_duration;
{code}

The above query currently fails (DRILL-1649). But that is how we have to do it 
currently.

It would be much simpler if I can do something like the below
{code}
select company_id, nested_agg('evnts', 'max', 'evnt_duration','evnt_id') as 
evnt_id from `nested.json`;
{code}

Apart from making the query much simpler to write this might enhance drill's 
performance as well.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-1716) Nested Data : There should be an easy way to apply aggregate functions on repeated types

Reply via email to