> Besides a lot of the use cases have multi valued dimensions which SQL standard doesn't support in general.
I'd be happy to try and make the multi value/array functionality work for whatever your use case is, so if you have any feedback to give, either in this thread, or on the proposal https://github.com/apache/incubator-druid/issues/7525, that would be great. > On the note of SQL support, do you have know of any examples in Druid SQL > where a sql aggregation function returns an array of doubles? I looked at > DoubleSketchSqlAggregator but it seems to be returning a single double > value. If an existing agg/postagg combination (e.g. https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html) doesn't provide what you need, then depending what it is you do need, something might be possible with the stuff I've been working on, though probably in a bit of a convoluted way (if at all). Thus far I've only added what I would consider synthetic support for array types, since they can only exist within the expression system, or as the serialized output of an expression post aggregator. Internally in Druid there are still only single/multi value string columns and single value long/float/double columns, so the rest of the query processing system is cannot operate directly on these array types. So, expression virtual columns which produce arrays must be coerced back into a native Druid type, which currently means probably either a string or a multi value string. If left as arrays, they automatically end up as a multi-value string. Using the 'array_to_string' function allows converting them into a single value, allowing grouping on the whole array in a sense. This joined string can then be fed into a 'string_to_array' expression post aggregator to split the strings back into the correct array type at the surface result level. Could you elaborate a bit more on what you are looking for? On Thu, Jun 27, 2019 at 10:44 PM Gian Merlino <g...@apache.org> wrote: > Hey Samarth, > > > I think it would be a good contribution to add a select only certain > fields > > /projection feature for native queries. Not every team, for example at my > > work, have adopted to use the Druid SQL. They just have been so used to > > writing json queries ;). Besides a lot of the use cases have multi valued > > dimensions which SQL standard doesn't support in general. > > The SQL standard doesn't have anything really like our mutli-valued > dimensions, but, that doesn't stop us from trying to make them work in SQL > anyway. Clint has been doing a bunch of work here recently. Check out some > of these related PRs: > > - https://github.com/apache/incubator-druid/pull/7588 > - https://github.com/apache/incubator-druid/pull/7973 > - https://github.com/apache/incubator-druid/pull/7974 > > > On the note of SQL support, do you have know of any examples in Druid SQL > > where a sql aggregation function returns an array of doubles? I looked at > > DoubleSketchSqlAggregator but it seems to be returning a single double > > value. > > I don't have an example, and I'm not sure if we've quite made it to arrays > of doubles yet, but Clint may be able to chime in with something > intelligent there. > > On Thu, Jun 27, 2019 at 1:44 PM Samarth Jain <samarth.j...@gmail.com> > wrote: > > > Thanks for the reply, Gian. I am working on adding SQL support for the > > t-digest module. > > > > I think it would be a good contribution to add a select only certain > fields > > /projection feature for native queries. Not every team, for example at my > > work, have adopted to use the Druid SQL. They just have been so used to > > writing json queries ;). Besides a lot of the use cases have multi valued > > dimensions which SQL standard doesn't support in general. > > > > On the note of SQL support, do you have know of any examples in Druid SQL > > where a sql aggregation function returns an array of doubles? I looked at > > DoubleSketchSqlAggregator but it seems to be returning a single double > > value. > > > > > > On Wed, Jun 26, 2019 at 10:26 PM Gian Merlino <g...@apache.org> wrote: > > > > > Hey Samarth, > > > > > > This kind of thing doable in Druid SQL, which will only return the > stuff > > > you SELECT. Native queries don't have a concept like that, so they > always > > > return everything, even if you intended certain things to be 'internal' > > > computations and aren't interested in seeing the results directly. If > it > > > makes sense for you to use SQL I would suggest going that route. > > Otherwise > > > it might be interesting to add a native query feature to select only > > > certain fields. > > > > > > On Wed, Jun 26, 2019 at 3:30 PM Samarth Jain <samarth.j...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > I recently contributed TDigest based sketch aggregators in Druid. It > > also > > > > included a post aggregator that lets you generate quantiles from the > > > > aggregated sketches. > > > > > > > > Example query: > > > > > > > > { > > > > "queryType": "groupBy", > > > > "dataSource": "test_datasource", > > > > "granularity": "ALL", > > > > "dimensions": [], > > > > "aggregations": [{ > > > > "type": "mergeTDigestSketch", > > > > "name": "merged_sketch", > > > > "fieldName": "ingested_sketch", > > > > "compression": 200 > > > > }], > > > > "postAggregations": [{ > > > > "type": "quantilesFromTDigestSketch", > > > > "name": "quantiles", > > > > "fractions": [0, 0.5, 1], > > > > "field": { > > > > "type": "fieldAccess", > > > > "fieldName": "merged_sketch" > > > > } > > > > }], > > > > "intervals": > > > ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"] > > > > } > > > > > > > > The one limitation I have been running into is that the above query > > > returns > > > > both merged_sketch that was aggregated and the quantiles array that > was > > > > generated from applying post aggregation on merged_sketch. What I > would > > > > rather want in this case is for the query to just return the > quantiles > > > > array. > > > > > > > > So instead of > > > > > > > > "version": "v1", > > > > "timestamp": "2019-06-25T00:00:00.000Z", > > > > "event": { > > > > "quantiles": [ > > > > 0, > > > > 162569.21411280808, > > > > 5814934 > > > > ], > > > > "merged_sketch": "AAAABBAXAS" > > > > } > > > > > > > > I would prefer this: > > > > "version": "v1", > > > > "timestamp": "2019-06-25T00:00:00.000Z", > > > > "event": { > > > > "quantiles": [ > > > > 0, > > > > 162569.21411280808, > > > > 5814934 > > > > ] > > > > } > > > > > > > > Is there a way to achieve this today? I tried changing post > aggregation > > > > field access from > > > > > > > > "field": { > > > > "type": "fieldAccess", > > > > "fieldName": "merged_sketch" > > > > } > > > > > > > > to > > > > > > > > "field": { > > > > "type": "finalizingFieldAccess", > > > > "fieldName": "merged_sketch" > > > > } > > > > > > > > but that didn't help either. > > > > > > > > Thanks, > > > > Samarth > > > > > > > > > >