Excellent. Thanks for the prompt feedback!
On Oct 26, 2012, at 5:10 PM, Ted Dunning <[email protected]> wrote: > The physical plan spec as it stands also includes an IMPLODE. The expected > idiom there would be EXPLODE, FILTER, IMPLODE. This will retain the > original structure, however. > > I think that what you are aiming at is similar to FLATTEN: > > https://developers.google.com/bigquery/docs/query-reference#flatten > https://developers.google.com/bigquery/docs/data#flatten > > I haven't addressed this yet in the physical plan spec, but it should be > pretty easily done. The sequence would be something like > EXPLODE/FILTER/FLATTEN to get the result you want and FLATTEN would be > similar to IMPLODE except that it would not glue the exploded field back > together. > > (Julian has worried about the default flattening behavior in > Dremel/BigQuery before... I don't know enough to have a strong opinion) > > On Fri, Oct 26, 2012 at 5:58 PM, Evan Pollan <[email protected]> wrote: > >> Thanks for the reply, Ted. >> >> What about for the simpler case of treating a nested collection as a >> one-to-many table and leaving the EXPLODE'ed results intact, as if the >> nested collection was JOIN'ed against it's containing record? >> >> E.g. being able to select all the x.y values from the following two >> records: >> { x: [ {y: 1}, {y: 2}, {y: 3} ] } >> { x: [ {y: 2}, {y: 4} ] } >> >> - as - >> >> 1 >> 2 >> 3 >> 2 >> 4 >> >> In other words, does an EXPLODE always have to be followed by an AGGREGATE. >> >> This statement in the BigQuery reference makes it sound like I might be out >> of luck: >> >> The WITHIN keyword specifically works with aggregate functions to aggregate >>> across children and repeated fields within records and nested fields >> >> >> >> >> On Fri, Oct 26, 2012 at 12:47 AM, Ted Dunning <[email protected]> >> wrote: >> >>> It it is the within clause that you are interested in, at the physical >> plan >>> layer, this is expressed as EXPLODE/AGGREGATE. Explode creates a batched >>> data flow which contains values from the nested collection. The >> aggregate >>> injects the results back into the original records. >>> >>> How this is implemented at the execution layer is more flexible. The >>> EXPLODE/AGGREGATE pattern could be recognized and optimized into a loop >>> that explicitly does the aggregation, especially for well-known >> aggregates. >>> >>> On Fri, Oct 26, 2012 at 12:43 AM, Ted Dunning <[email protected]> >>> wrote: >>> >>>> Does the WITHIN clause help? In BigQuery, this is described here: >>>> >>>> https://developers.google.com/bigquery/docs/query-reference#within >>>> >>>> >>>> On Thu, Oct 25, 2012 at 2:51 PM, Evan Pollan <[email protected] >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I attended Tomer's Strata/HadoopWorld presentation on Drill yesterday, >>> and >>>>> was very impressed. Lots of features that map directly to my needs. >>>>> >>>>> He specifically cited support for, on the HDFS side, JSON/BSON, avro, >>> and >>>>> sequence files and emphasized the ability to access nested data. We >> use >>>>> JSON heavily, so it sounds like Drill would support base-case queries >>> over >>>>> nested properties within my dataset. One question I didn't get the >>> chance >>>>> to ask, though: what about querying over records with nested >>> collections? >>>>> For example, I have some JSON datasets that look like: >>>>> >>>>> { >>>>> "propertyA": "valueA", >>>>> "propertyB": [ >>>>> { >>>>> "propertyX": "value1", >>>>> "propertyY": "value2" >>>>> }, >>>>> { >>>>> "propertyX": "value3", >>>>> "propertyY": "value4" >>>>> } >>>>> ] >>>>> } >>>>> >>>>> In this case, I have users that would like to be able to access >>>>> propertyB.propertyX and leverage it in joins and aggregations. Since >>> each >>>>> record has N propertyB.propertyX values, though, I'm wondering how >>> Drill's >>>>> query planner and execution engine would handle this? >>>>> >>>>> thanks, >>>>> Evan >>>>> >>>> >>>> >>> >>
