[GitHub] datametrics opened a new issue #6035: quantile aggregation (99%) in druid from hive

GitBox Sun, 22 Jul 2018 14:00:41 -0700

datametrics opened a new issue #6035: quantile aggregation (99%) in druid from 
hive
URL: https://github.com/apache/incubator-druid/issues/6035
 
 
   Hi all together,
   
   i've got a question concerning data ingestion from inside hive.
   I use a query similar to this one for indexing a result view in Druid.
   
   use LONGFIELD_DWH;
   DROP TABLE IF EXISTS TestCube;
   EXPLAIN CREATE TABLE TestCube
   STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
   TBLPROPERTIES (
   "druid.segment.granularity" = "MONTH",
   "druid.query.granularity" = "DAY")
   AS
   SELECT
   cast(DimDate.Name as timestamp with local time zone) as __time,
   cast(DimInstrument.Name as string) InsName,
   FactTable.TestMeasure * p.Quantity as TestMeasure,
   cast(DimInstrument.InstrumentType as string) InstrumentType,
   cast(FactTable.IdScenario as string) Scenario,
   cast(p.Portfolio as string) Portfolio,
   p.Quantity as Quantity
   FROM FactTable
   JOIN DimDate on (FactTable.IdDate = DimDate.Id)
   JOIN DimInstrument on (FactTable.IdInstrument = DimInstrument.Id)
   JOIN DimScenario on (FactTable.IdScenario = DimScenario.Id)
   JOIN (select pos.Name, pos.IdInstrument, pos.IdPortfolio,pf.Name as 
Portfolio,
   pos.Quantity from FactPosition pos
   join DimInstrument ins on ins.Id = pos.IdInstrument
   join DimPortfolio pf on pf.Id = pos.IdPortfolio) p
   on DimInstrument.Id = p.IdInstrument;
   
   That works without any issues. The datasource gets published in druid and is 
visible
   in superset after querying datasource metadata. A segment is created for 
every day
   in the facttable.
   
   As part of my analysis i would like to build a postaggregated quantile 
measure like
   described on the project documentation page:
   
   At first i create a new metric in superset "HISTOGRAM" of type 
"approxHistogramFold" with this JSON:
   
   {
   "type" : "approxHistogramFold",
   "name" : "HISTOGRAM",
   "fieldName" : "sum__testmeasure",
   "resolution" : 500,
   "numBuckets" : 500,
   "lowerLimit" : 0.0
   }
   
   Next, i create another metric in superset "Response_99" (type = 
"postagg")that should display the
   fifth-worst (i.e. the 99% quantile) item from the list of 500 ordered 
testmeasure items
   from the approxhistogram measure (JSON):
   
   {
   "type": "quantile",
   "name": "HISTOGRAM",
   "fieldName": "HISTOGRAM",
   "probability": "0.99"
   }
   
   That measure combination runs but the output is "na". As far as I understand 
the documentation the field
   of the first aggregator for "testmeasure" has to be ingested with a specific 
metric-spec / ingestion
   aggregator similar to this:
   
   
http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html
   
   Does anyone know how to incorporate this into the ingestion query issued in 
HIVE?
   If not, does anyone see how to mimic such quantile measures maybe without 
"approxHistogramFold" and
   probably in a simple javascript postaggregation?
   
   Thanks in advance!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] datametrics opened a new issue #6035: quantile aggregation (99%) in druid from hive

Reply via email to