rumbin commented on issue #12005:
URL: https://github.com/apache/superset/issues/12005#issuecomment-895567802


   The problem with calculating the Boxplot metrics on the query result is - 
even if distributing across a unique column - that the row limit hits hard and 
silently:
   If the number of data points across all series exceeds the row limit, the 
resulting boxplot is non-deterministically excluding data points without 
notifying the user.
   Non-deterministically, since there is no ORDER BY applied, nor is it 
configurable.
   
   So we have three issues that are caused by the current Boxplot logic:
   1. There is no way of including all records, as soon as the number of rows 
exceeds the row limit.
   2. Whether the row limit has been reached is not displayed anywhere.
   3. The row limit excludes records in a non-deterministic fashion, as no 
explicit ordering is present.
   
   In my eyes, all of these issues can best be covered by calculating all 
Boxplot metrics per series directly within the SQL query. The only drawback 
that I can immediately see is that outliers cannot be returned by such a 
query...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to