599166320 opened a new issue, #18055:
URL: https://github.com/apache/druid/issues/18055
### Description
In many real-world scenarios, we store data with the same schema in
different tables. Over time, the same logical dataset may be split into
multiple physical tables due to differences in retention periods, cost
considerations, or ingestion strategies.
These tables are often referenced in many places—some are used in Grafana
dashboards, while others are queried by backend applications. As the volume of
data grows, so do storage and maintenance costs.
You might suggest using Spark or Hadoop to reindex the data into a single
table, but that can be compute-intensive. Alternatively, unioning tables at
query time is viable, but updating hundreds or even thousands of queries across
systems is impractical.
### Proposal
To address this, I implemented a solution that allows automatic union of
multiple tables with the same schema at query time using the queryContext
parameter.
Here’s an example:
```
{
"mergeWithTables": {
"t_a": "t_a1,t_a2,t_a3"
}
}
```
t_a is the primary table being queried.
t_a1, t_a2, and t_a3 are additional tables containing data with the same
schema but different time ranges or retention policies.
With this configuration, a query on t_a will automatically union data from
t_a1, t_a2, and t_a3.
### How It Works
When the query reaches the Broker:
The Broker reads the mergeWithTables context.
It retrieves the timeline and segment information for all involved
datasources (t_a, t_a1, t_a2, t_a3) from TimelineServerView.
It constructs the appropriate segment Sequences from Historical nodes.
Results from all sources are merged transparently and returned as a single
result set.
This approach provides a powerful and low-friction way to manage table
partitioning by time or cost, without requiring changes in consuming systems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]