datametrics opened a new issue #6035: quantile aggregation (99%) in druid from hive URL: https://github.com/apache/incubator-druid/issues/6035 Hi all together, i've got a question concerning data ingestion from inside hive. I use a query similar to this one for indexing a result view in Druid. use LONGFIELD_DWH; DROP TABLE IF EXISTS TestCube; EXPLAIN CREATE TABLE TestCube STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "MONTH", "druid.query.granularity" = "DAY") AS SELECT cast(DimDate.Name as timestamp with local time zone) as __time, cast(DimInstrument.Name as string) InsName, FactTable.TestMeasure * p.Quantity as TestMeasure, cast(DimInstrument.InstrumentType as string) InstrumentType, cast(FactTable.IdScenario as string) Scenario, cast(p.Portfolio as string) Portfolio, p.Quantity as Quantity FROM FactTable JOIN DimDate on (FactTable.IdDate = DimDate.Id) JOIN DimInstrument on (FactTable.IdInstrument = DimInstrument.Id) JOIN DimScenario on (FactTable.IdScenario = DimScenario.Id) JOIN (select pos.Name, pos.IdInstrument, pos.IdPortfolio,pf.Name as Portfolio, pos.Quantity from FactPosition pos join DimInstrument ins on ins.Id = pos.IdInstrument join DimPortfolio pf on pf.Id = pos.IdPortfolio) p on DimInstrument.Id = p.IdInstrument; That works without any issues. The datasource gets published in druid and is visible in superset after querying datasource metadata. A segment is created for every day in the facttable. As part of my analysis i would like to build a postaggregated quantile measure like described on the project documentation page: At first i create a new metric in superset "HISTOGRAM" of type "approxHistogramFold" with this JSON: { "type" : "approxHistogramFold", "name" : "HISTOGRAM", "fieldName" : "sum__testmeasure", "resolution" : 500, "numBuckets" : 500, "lowerLimit" : 0.0 } Next, i create another metric in superset "Response_99" (type = "postagg")that should display the fifth-worst (i.e. the 99% quantile) item from the list of 500 ordered testmeasure items from the approxhistogram measure (JSON): { "type": "quantile", "name": "HISTOGRAM", "fieldName": "HISTOGRAM", "probability": "0.99" } That measure combination runs but the output is "na". As far as I understand the documentation the field of the first aggregator for "testmeasure" has to be ingested with a specific metric-spec / ingestion aggregator similar to this: http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html Does anyone know how to incorporate this into the ingestion query issued in HIVE? If not, does anyone see how to mimic such quantile measures maybe without "approxHistogramFold" and probably in a simple javascript postaggregation? Thanks in advance!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
