bbull77 opened a new issue, #17709:
URL: https://github.com/apache/druid/issues/17709

   ### Description
   
   Using SQL "INSERT INTO EXTERN( s3()) AS CSV" as prefix part of deep-storage 
MSQ query, currently the query output is indeed written to S3 location, but it 
comes with a cost of needing assembling into one file because the MSQ output 
are output into multiple partition/parts in the S3 location. 
   
   It would be super-useful if there is an option to have the deep-storage MSQ 
to output into one final CSV file instead. 
   
   
   ### Motivation
   
   Please provide the following for the desired feature or change:
   - A detailed description of the intended use case, if applicable
   
   The CSV one-file-output stored on S3 can be accessed conveniently, without 
knowing the assembly details of multiple smaller partition sub-files.
   
   - Rationale for why the desired feature/change would be beneficial
   
   Deep-storage MSQ query is an important contribution to Druid as an 
alternative to query cold/backup data in a non-urgent timeline. Currently 
getting the query result back is challenging for tge user community, because  
each of the following methods has drawbacks:
   1. Getting the result through the Druid endpoint 
“druid/v2/sql/statements/{queryId}/results/page=[page$]&&resultFormat=csv”, is 
sequential and very time-consuming (one test I did is that it takes 30min to 
get 1G of query result, using a curl HTTPS call running inside the same AWS 
region (as the S3 bucket)) 
   2. As explained in the above feature request, using "INSERT INTO EXTERN( 
s3()) AS CSV". The output is distributed across many smaller files. The 
deep-query-caller need to know the implementation-details of how these files 
are named/sequenced to be able to assemble them back into one complete query 
result. This is neither error-proof nor productive for the Druid user community.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to