rumbin commented on issue #12005: URL: https://github.com/apache/superset/issues/12005#issuecomment-895567802
The problem with calculating the Boxplot metrics on the query result is - even if distributing across a unique column - that the row limit hits hard and silently: If the number of data points across all series exceeds the row limit, the resulting boxplot is non-deterministically excluding data points without notifying the user. Non-deterministically, since there is no ORDER BY applied, nor is it configurable. So we have three issues that are caused by the current Boxplot logic: 1. There is no way of including all records, as soon as the number of rows exceeds the row limit. 2. Whether the row limit has been reached is not displayed anywhere. 3. The row limit excludes records in a non-deterministic fashion, as no explicit ordering is present. In my eyes, all of these issues can best be covered by calculating all Boxplot metrics per series directly within the SQL query. The only drawback that I can immediately see is that outliers cannot be returned by such a query... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
