egalpin opened a new issue, #12315: URL: https://github.com/apache/pinot/issues/12315
I know mass export isn’t a primary focus of Pinot (understandably, with the focus on real-time ingestion and low latency querying instead), but there are use cases where mass export would be very useful. There are alternative options, but in particular when upsert is employed and data consistency across data export + aggregate results is important, serving results from the same source of data (i.e. Pinot) would be ideal. Latency requirements would be very different for this use case, with minutes/hours (days?) being completely acceptable. The key for an appropriate solution would be that the impact on servers and brokers would be minimized, allowing them to continue serving low latency queries. One high-level concept would be something like: given a SQL query without any aggregations, generate minion tasks in the form of segment name + ID_SET of matching doc IDs based on the provided SQL query; each task would then have minions download the segment from server/deepstore, pluck out the matching documents based on the segments corresponding ID_SET, apply transformations from the SQL query for provided projections, and then write those resulting rows back to deepstore in CSV form (or parquet, or configurable form). This approach would place a lot of the heavy-lifting of disk seeks/long-running queries related to mass export onto minions that might otherwise be concerning for servers to handle while also handling other queries. This issue ticket could serve as the starting place for brainstorming requirements and community interest prior to undertaking a design document. cc @mcvsubbu @mayankshriv -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
