DockerLive opened a new issue #10168:
URL: https://github.com/apache/druid/issues/10168
### Affected Version
Druid version : 0.18.0
### Description
- Use the Kafka-Indexing service to ingest real-time data
- The data Schema has a dimension field of type string;The real value, of
course, is a number, and is typed as a string to make it easier to use the
Bitmap index
- I had a problem converting this string type dimension field to a number
type in an SQL query for aggregate statistics.The statistics occasionally show
a value of 0
- There is no problem with corresponding statistics when the data is
solidified
- When I tried to make this field into a numerical metric, I had no problem
with statistics
- This issue did not occur in the previous version of the upgrade (0.15.0)
**Kafka-Indexing json:**
"dataSchema": {
"dataSource": "cell_monitor_v3_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"metricsSpec": [
{
"type": "longSum",
"name": "cell_count",
"fieldName": "cell_count",
"expression": null
},
{
"type": "longSum",
"name": "sum_status_continue_time",
"fieldName": "sum_status_continue_time",
"expression": null
}
],
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "snapshot_time",
"format": "millis"
},
"dimensionsSpec": {
"dimensions": [
"space_status",
"reserve_status",
"dis_code",
"business_type",
"status_continue_time",
"has_exception"
]
}
}
},
"transformSpec": {
"transforms": [
{ "type": "expression", "name":
"sum_status_continue_time", "expression": "nvl(status_continue_time,0)" }
]
}
}
**As a string dimension, the query SQL is shown below, with a data statistic
of 0 in the ingestion Task interval :**
SELECT
__time,
SUM(cell_count),
round((SUM(CAST(status_continue_time AS NUMERIC))/2)/sum(cell_count),2),
SUM(CAST(status_continue_time AS NUMERIC))/2
FROM "cell_monitor_v3_test"
WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '10' MINUTE
AND space_status = 2
AND business_type > 0
AND has_exception = 0
GROUP BY __time
**As a numerical indicator, the query SQL is shown below, and the data
statistics are correct in the ingestion Task interval :**
SELECT
__time,
SUM(cell_count),
round((SUM(sum_status_continue_time)/2)/sum(cell_count),2),
SUM(sum_status_continue_time)/2
FROM "cell_monitor_v3_test"
WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '10' MINUTE
AND space_status = 2
AND business_type > 0
AND has_exception = 0
GROUP BY __time
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]