I think this isn't right. I got a user saying that their CSV export today was 
just 1k rows and I was confused as I thought the limit was much higher like 1M 
for CSVs. I got confused because I read the code and thought "mmh I thought 
there was 2 different settings for this..." but couldn't find evidence of that 
since this PR had changed that.

So when is `async` mode we should apply the `SQL_MAX_ROW` limit (1M by default) 
to send to the results backend, so that the `csv/` endpoint can hit that in 
async mode. The `results/` endpoint in async mode (reading from the results 
backend) should read only the first `DISPLAY_SQL_MAX_ROW` rows only and return 
that to prevent the UI from crashing. I think this was all working prior to 
this PR.

In non-async mode, we should apply `DISPLAY_SQL_MAX_ROW` on the `sql_json/` 
endpoint, and apply no limit o the `csv/` endpoint. I think this was not 
working before, we would just return `SQL_MAX_ROW` and crash the browser. This 
didn't happen much at Lyft and Airbnb since we run most big databases in async 
mode.

Sorry this is all a bit confusing and clearly needs to be documented better. We 
also need to allow the user to know what is happening here and maybe allow them 
to change their limits, though we shouldn't let them crash their browser too 
easily.

@timifasubaa @villebro ^^^

[ Full content available at: 
https://github.com/apache/incubator-superset/pull/5392 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to