cisco2825 opened a new issue, #14914:
URL: https://github.com/apache/druid/issues/14914
I have written a druid topN query to fetch the last n transactions performed
from the db. I have aggregated __time column and used it as the metric for the
topN query. For aggregation I have used longLast aggregator.
But this query is taking too much time in case of multiple segment data. Can
someone please suggest me what optimization I can do in the query to make it
faster. Or if there is some other faster way to get the last n transactions.
Adding the structure of the query I have:
`{
"dataSource": {
"type": "query",
"query": {
"dataSource": {
"type": "table",
"name": "txn"
},
"queryType": "topN",
"intervals": [
"2021-08-24T15:51:54.599+05:30/2023-08-24T10:21:54.599Z"
],
"granularity": "all",
"virtualColumns": [
{
"type": "expression",
"name": "sourceValue",
"outputType": "STRING",
"expression": "regexp_extract(trim(transEntityRecords),
'(.*)nslAttributeID(.*)909075081867,.values.:(..([-0-9]*(.[-0-9]+)?))(.*)', 4)"
}
],
"filter": {
"type": "and",
"fields": [
{
"type": "in",
"dimension": "gsiId",
"values": [
"1595974428413"
]
},
{
"type": "in",
"dimension": "triggerCuId",
"values": [
"1240994321229",
]
},
{
"type": "selector",
"dimension": "triggerState",
"value": "COMPLETED"
},
{
"type": "or",
"fields": [
{
"type": "selector",
"dimension": "gsiName",
"value": "MaxLimitLoadTestingGSI"
},
{
"type": "selector",
"dimension": "cuName",
"value": "test CU27281"
},
{
"type": "selector",
"dimension": "cuIndex",
"value": "1"
},
{
"type": "selector",
"dimension": "txnSlotItemDataName",
"value": "NumberEntity 862"
}
]
}
]
},
"aggregations": [
{
"type": "count",
"name": "count"
},
{
"type": "doubleLast",
"name": "sourceValue",
"fieldName": "sourceValue"
},
{
"type": "longLast",
"name": "time",
"fieldName": "__time"
}
],
"dimension": {
"type": "default",
"dimension": "transactionId",
"outputName": "transactionId",
"outputType": "STRING"
},
"threshold": 10000,
"metric": "time"
}
},
"queryType": "groupBy",
"intervals": [
"2021-08-24T15:51:54.599+05:30/2023-08-24T10:21:54.599Z"
],
"granularity": "all",
"aggregations": [
{
"type": "doubleMean",
"name": "sourceMean",
"fieldName": "sourceValue"
},
{
"type": "count",
"name": "count"
}
],
"dimensions": []
}`
NOTE:
1) Interval is supposed to be 2 years. So can not reduce that
2) All the filters/virtual columns are essentials
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]