iamruhua opened a new issue #10704: URL: https://github.com/apache/druid/issues/10704
### Description We are currently working on a IoT project which will integrate a large number of devices, and save the raw data(which is semi-structured) and do some simple statistical or analytical aggregation at realtime. There would be millions of devices from hundreds of zone, connected to the project(High cardinality) Currently we tried to use kafka ingestion, which druid will pull raw data and save them. And another pulling procedure will pull the kafka again to aggregate based on some dimention( zoneID, typeOfDevice ### Motivation So the problem(might be) is, we have to pull kafka multiple times for different purpose but identical dataset. Is there any way we could add or configure a pipeline for this kind of process? Like we received device reading from deviceA of zoneA, save reading data into a table named raw_data. And then use the same data to calculate the sum(reading.data) to save in a aggregated table sum_by_zone? This could save a lot of bandwidth and computing resources. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
