egalpin opened a new issue, #12315:
URL: https://github.com/apache/pinot/issues/12315

   I know mass export isn’t a primary focus of Pinot (understandably, with the 
focus on real-time ingestion and low latency querying instead), but there are 
use cases where mass export would be very useful.  There are alternative 
options, but in particular when upsert is employed and data consistency across 
data export + aggregate results is important, serving results from the same 
source of data (i.e. Pinot) would be ideal.
   
   Latency requirements would be very different for this use case, with 
minutes/hours (days?) being completely acceptable. The key for an appropriate 
solution would be that the impact on servers and brokers would be minimized, 
allowing them to continue serving low latency queries.
   
   One high-level concept would be something like: given a SQL query without 
any aggregations, generate minion tasks in the form of segment name + ID_SET of 
matching doc IDs based on the provided SQL query; each task would then have 
minions download the segment from server/deepstore, pluck out the matching 
documents based on the segments corresponding ID_SET, apply transformations 
from the SQL query for provided projections, and then write those resulting 
rows back to deepstore in CSV form (or parquet, or configurable form).  This 
approach would place a lot of the heavy-lifting of disk seeks/long-running 
queries related to mass export onto minions that might otherwise be concerning 
for servers to handle while also handling other queries.
   
   This issue ticket could serve as the starting place for brainstorming 
requirements and community interest prior to undertaking a design document.
   
   cc @mcvsubbu @mayankshriv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to