Hi,
I recently contributed TDigest based sketch aggregators in Druid. It also
included a post aggregator that lets you generate quantiles from the
aggregated sketches.
Example query:
{
"queryType": "groupBy",
"dataSource": "test_datasource",
"granularity": "ALL",
"dimensions": [],
"aggregations": [{
"type": "mergeTDigestSketch",
"name": "merged_sketch",
"fieldName": "ingested_sketch",
"compression": 200
}],
"postAggregations": [{
"type": "quantilesFromTDigestSketch",
"name": "quantiles",
"fractions": [0, 0.5, 1],
"field": {
"type": "fieldAccess",
"fieldName": "merged_sketch"
}
}],
"intervals": ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
}
The one limitation I have been running into is that the above query returns
both merged_sketch that was aggregated and the quantiles array that was
generated from applying post aggregation on merged_sketch. What I would
rather want in this case is for the query to just return the quantiles
array.
So instead of
"version": "v1",
"timestamp": "2019-06-25T00:00:00.000Z",
"event": {
"quantiles": [
0,
162569.21411280808,
5814934
],
"merged_sketch": "AAAABBAXAS"
}
I would prefer this:
"version": "v1",
"timestamp": "2019-06-25T00:00:00.000Z",
"event": {
"quantiles": [
0,
162569.21411280808,
5814934
]
}
Is there a way to achieve this today? I tried changing post aggregation
field access from
"field": {
"type": "fieldAccess",
"fieldName": "merged_sketch"
}
to
"field": {
"type": "finalizingFieldAccess",
"fieldName": "merged_sketch"
}
but that didn't help either.
Thanks,
Samarth