Hi,
I attended Tomer's Strata/HadoopWorld presentation on Drill yesterday, and
was very impressed. Lots of features that map directly to my needs.
He specifically cited support for, on the HDFS side, JSON/BSON, avro, and
sequence files and emphasized the ability to access nested data. We use
JSON heavily, so it sounds like Drill would support base-case queries over
nested properties within my dataset. One question I didn't get the chance
to ask, though: what about querying over records with nested collections?
For example, I have some JSON datasets that look like:
{
"propertyA": "valueA",
"propertyB": [
{
"propertyX": "value1",
"propertyY": "value2"
},
{
"propertyX": "value3",
"propertyY": "value4"
}
]
}
In this case, I have users that would like to be able to access
propertyB.propertyX and leverage it in joins and aggregations. Since each
record has N propertyB.propertyX values, though, I'm wondering how Drill's
query planner and execution engine would handle this?
thanks,
Evan