Nested collections (e.g. JSON arrays) and drill queries

Evan Pollan Thu, 25 Oct 2012 11:51:52 -0700

Hi,

I attended Tomer's Strata/HadoopWorld presentation on Drill yesterday, and
was very impressed.  Lots of features that map directly to my needs.


He specifically cited support for, on the HDFS side, JSON/BSON, avro, and
sequence files and emphasized the ability to access nested data.  We use
JSON heavily, so it sounds like Drill would support base-case queries over
nested properties within my dataset.  One question I didn't get the chance
to ask, though:  what about querying over records with nested collections?
 For example, I have some JSON datasets that look like:

{
    "propertyA": "valueA",
    "propertyB": [
        {
            "propertyX": "value1",
            "propertyY": "value2"
        },
        {
            "propertyX": "value3",
            "propertyY": "value4"
        }
    ]
}

In this case, I have users that would like to be able to access
propertyB.propertyX and leverage it in joins and aggregations.  Since each
record has N propertyB.propertyX values, though, I'm wondering how Drill's
query planner and execution engine would handle this?

thanks,
Evan

Nested collections (e.g. JSON arrays) and drill queries

Reply via email to