bbull77 opened a new issue, #17709:
URL: https://github.com/apache/druid/issues/17709
### Description
Using SQL "INSERT INTO EXTERN( s3()) AS CSV" as prefix part of deep-storage
MSQ query, currently the query output is indeed written to S3 location, but it
comes with a cost of needing assembling into one file because the MSQ output
are output into multiple partition/parts in the S3 location.
It would be super-useful if there is an option to have the deep-storage MSQ
to output into one final CSV file instead.
### Motivation
Please provide the following for the desired feature or change:
- A detailed description of the intended use case, if applicable
The CSV one-file-output stored on S3 can be accessed conveniently, without
knowing the assembly details of multiple smaller partition sub-files.
- Rationale for why the desired feature/change would be beneficial
Deep-storage MSQ query is an important contribution to Druid as an
alternative to query cold/backup data in a non-urgent timeline. Currently
getting the query result back is challenging for tge user community, because
each of the following methods has drawbacks:
1. Getting the result through the Druid endpoint
“druid/v2/sql/statements/{queryId}/results/page=[page$]&&resultFormat=csv”, is
sequential and very time-consuming (one test I did is that it takes 30min to
get 1G of query result, using a curl HTTPS call running inside the same AWS
region (as the S3 bucket))
2. As explained in the above feature request, using "INSERT INTO EXTERN(
s3()) AS CSV". The output is distributed across many smaller files. The
deep-query-caller need to know the implementation-details of how these files
are named/sequenced to be able to assemble them back into one complete query
result. This is neither error-proof nor productive for the Druid user community.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]