[I] Time comparison does not always capture the correct data when the result set is limited [superset]

via GitHub Wed, 13 Mar 2024 17:20:36 -0700


eschutho opened a new issue, #27504:
URL: https://github.com/apache/superset/issues/27504


   ### Bug description
   
   When fetching a set of data over a period of time and adding a time 
comparison, if the result set is truncated, some of the comparison information 
will also be missing. 
   <img width="2324" alt="Screenshot 2024-03-13 at 4 55 31 PM" 
src="https://github.com/apache/superset/assets/5186919/87759e39-ef7e-4288-8dae-5082860951ce";>
   
   In the above example, I am trying to fetch the top 10 years from the years 
1989 - 2004 where the name Alex was the most popular and compare that count to 
the period 10 years prior. As you can see, the values for 1981, 1982, 1983, and 
1984 are missing, although I would expect to get those values, because as you 
can see, if I fetch more data by removing the row limit, those values exist. 
   
   <img width="2324" alt="Screenshot 2024-03-13 at 4 55 59 PM" 
src="https://github.com/apache/superset/assets/5186919/b58663c7-f38f-424f-995e-f7c25ce0e4b5";>
   
   I believe the reason why this bug exists is because we are making two 
separate db requests, one for each time period when the name is "Alex", and 
limiting each request by 10 and sorting on the count. Because the count for 
each response is going to be different, when we make the comparison, the values 
for the dates in the original time series may not exist. 
   
   Here are the db requests: 
   <img width="894" alt="Screenshot 2024-03-13 at 5 01 33 PM" 
src="https://github.com/apache/superset/assets/5186919/77f9b08c-a8b6-498b-8268-2eba5c2d6024";>
   <img width="894" alt="Screenshot 2024-03-13 at 5 01 59 PM" 
src="https://github.com/apache/superset/assets/5186919/f4647d8c-528e-480b-b702-b3a9aea8c3c5";>
   
   I believe what needs to happen instead is that the first query with 
timerange A needs to join the data from timerange B on a left outer join, and 
then apply the limit, and that would give the correct results. Something like:
   ```
   SELECT 
     query1.ds AS ds1,
     query1.count AS count1,
     query2.ds AS ds2,
     query2.count AS count2
   FROM 
     (
       SELECT 
         DATE_TRUNC('year', ds) AS ds,
         COUNT(*) AS count
       FROM 
         public.birth_names
       WHERE 
         ds >= TO_TIMESTAMP('1989-03-13 00:00:00.000000', 'YYYY-MM-DD 
HH24:MI:SS.US') AND
         ds < TO_TIMESTAMP('2004-03-13 00:00:00.000000', 'YYYY-MM-DD 
HH24:MI:SS.US') AND
         name IN ('Alex')
       GROUP BY 
         DATE_TRUNC('year', ds)
       ORDER BY 
         ds DESC
     ) AS query1
   LEFT JOIN 
     (
       SELECT 
         DATE_TRUNC('year', ds) AS ds,
         COUNT(*) AS count
       FROM 
         public.birth_names
       WHERE 
         ds >= TO_TIMESTAMP('1979-03-13 00:00:00.000000', 'YYYY-MM-DD 
HH24:MI:SS.US') AND
         ds < TO_TIMESTAMP('1994-03-13 00:00:00.000000', 'YYYY-MM-DD 
HH24:MI:SS.US') AND
         name IN ('Alex')
       GROUP BY 
         DATE_TRUNC('year', ds)
       ORDER BY 
         ds DESC
     ) AS query2 
   ON 
     query1.ds = query2.ds + INTERVAL '10 years'
   ORDER BY 
     query1.ds DESC
   LIMIT 10;
   ```
   <img width="1926" alt="Screenshot 2024-03-13 at 5 16 56 PM" 
src="https://github.com/apache/superset/assets/5186919/e1c9b3f3-934d-4666-a342-f9751ac3466c";>
   
   
   ### How to reproduce the bug
   
   1. create a line chart with a date range
   2. create a time comparison to the chart
   3. truncate the values by limiting the top rows
   
   ### Screenshots/recordings
   
   See above
   
   ### Superset version
   
   master / latest-dev
   
   ### Python version
   
   3.10
   
   ### Node version
   
   18 or greater
   
   ### Browser
   
   Chrome
   
   ### Additional context
   
   I am using the examples db on Postgres with the birth_names dataset
   
   ### Checklist
   
   - [X] I have searched Superset docs and Slack and didn't find a solution to 
my problem.
   - [X] I have searched the GitHub issue tracker and didn't find a similar bug 
report.
   - [ ] I have checked Superset's logs for errors and if I found a relevant 
Python stacktrace, I included it here as text in the "additional context" 
section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

[I] Time comparison does not always capture the correct data when the result set is limited [superset]

Reply via email to