599166320 opened a new issue, #12929: URL: https://github.com/apache/druid/issues/12929
### Motivation We use druid to store a large amount of monitoring data. By default, the Kafka index service is used to ingest data, and the query performance is very poor. After analysis, by default, the monitoring data will be randomly written into each partition of Kafka. The data will be randomly consumed by Kafka peon, and the index will be established to generate segments. When querying, the broker can only filter segments by time, and cannot further cut them according to the query conditions. It needs to scan a large number of real-time nodes, historical nodes, and a large number of segments, resulting in poor performance. ### Proposed changes 1. Kafka real-time index service supports a new partition type, adding `KafkaPartitionNumberedShardSpec` class, which extents from `NumberedShardSpec` ,Override the `possibleInDomain` function in the `KafkaPartitionNumberedShardSpec` class, implement hash filtering, and add the following core fields:`type = "kafka_partition", kafkapartitionids, partitiondimensions`. 2. Modify the `KafkaIndexTask` class to support the `KafkaPartitionNumberedShardSpec` type in `kafkaindextask.newdriver`. 3. Add the `partitionfunction` field in the configuration of Kafka ingression spec, and configure all fields that need to be hashed in the `partitionfunction`. 4. Add a simple hash code to the data production side: ``` int p = hash(dim1,dim2,...) ProducerRecord<byte[], byte[]>(topic, p,event) ``` ### Rationale When writing data, partition the data, which is beneficial to compression and sorting. During the query, the scanning range is further trimmed by the user-defined filter conditions to optimize the performance of concurrent queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
