GitHub user dosubot[bot] added a comment to the discussion: How to Handle Large 
Data Exports (10M+ Rows) in Superset?

<!-- Answer -->
To handle large data exports in Superset efficiently, especially for datasets 
over 10 million rows, consider the following strategies:

1. **Use Celery for Asynchronous Processing**: Superset can leverage Celery as 
a background worker process to handle large data exports asynchronously. This 
setup involves configuring Celery workers, a message broker like Redis or 
RabbitMQ, and a results backend. This allows Superset to manage long-running 
queries without blocking the main application, which can help with large data 
exports 
<sup>[[1]](https://github.com/apache/superset/blob/master/docs/docs/configuration/async-queries-celery.mdx)</sup><sup>[[2]](https://github.com/apache/superset/blob/master/superset/async_events/async_query_manager.py)</sup>.

2. **Optimize CSV Downloads with S3**: If you're using Presto DB, configure it 
to automatically persist query results to S3. Enable the `DownloadCSVFromS3` 
and `ShowDefaultCSVOptions` feature flags, and set the necessary environment 
variables (`SUPERSET_REGION`, `SUPERSET_WORKGROUP`, `SUPERSET_ATHENA_DB`). This 
approach uses Athena's native S3 functionality to return a presigned S3 URL, 
reducing download time for large CSV files 
<sup>[[3]](https://github.com/apache/superset/issues/31482)</sup>.

3. **Limitations and Constraints**: Be aware that Superset's current export 
functionality does not support pagination for CSV exports, which means the 
entire dataset is exported at once. This can lead to performance issues when 
dealing with large datasets 
<sup>[[4]](https://github.com/apache/superset/blob/master/superset-frontend/CHANGELOG.md)</sup>.

These strategies can help manage memory usage and improve performance during 
large data exports in Superset.


<!-- Dosu Comment Footer -->
*To continue talking to [Dosu](https://dosu.dev), mention @dosu.*

---
<sup>Help Dosu learn! Give it feedback: [Great 
Response](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=great_response)
 | [Irrelevant 
Answer](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=irrelevant_answer)
 | [Incorrect 
Sources](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=incorrect_sources)
 | [Too 
Verbose](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=too_verbose)
 | 
[Hallucination](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=hallucination)
 | [Bug 
Report](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=bug_report)
 | 
[Other](https://app.dosu.dev/response-feedback/d6a1affb-4eb6-4703-a75a-cdbe074eb5fc?feedback_type=other)</sup>


GitHub link: 
https://github.com/apache/superset/discussions/33530#discussioncomment-13202543

----
This is an automatically sent email for notifications@superset.apache.org.
To unsubscribe, please send an email to: 
notifications-unsubscr...@superset.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to