Re: [I] deep-storage MSQ query be able to write to one CSV file on S3 (druid)

via GitHub Wed, 12 Feb 2025 08:27:31 -0800


cryptoe commented on issue #17709:
URL: https://github.com/apache/druid/issues/17709#issuecomment-2654237317


   > Getting the result through the Druid endpoint 
“druid/v2/sql/statements/{queryId}/results/page=[page$]&&resultFormat=csv”, is 
sequential and very time-consuming (one test I did is that it takes 30min to 
get 1G of query result, using a curl HTTPS call running inside the same AWS 
region (as the S3 bucket))
   
   If you could share the flamegraph of the broker while running this, I can 
debug the cause of slowness. 
   
   
   Regarding 2, the processing engine has various tasks which are distributed. 
To do what you are saying means effectively having one task in the final stage 
since we want to preserve ordering and stuff across the tasks which is not a 
scalable design. 
   For your use case, if you want to do this, add a limit to the query. It will 
force the last stage to be a single stage which will result in a single csv 
file on s3. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] deep-storage MSQ query be able to write to one CSV file on S3 (druid)

Reply via email to