michaelritsema commented on issue #28610:
URL: https://github.com/apache/superset/issues/28610#issuecomment-2147846348

   If you had a denormalized table with a persons information called 
PersonSightings
   
   with a row like:
   
   person_id:1,f_name:evan,l_name:r, address:"343232 W. Palm St 
Jacksonville,Flordia",:sighting_timestamp:2024-01-01
   
   imagine there was a new row for everytime a camera spotted the person
   
   You would want to take advantage of the fact that most of the columsn are 
functionality depenendt on person_id
   
   Currently to get a report on distinct sightings you'd need to add most of 
the columns as a dimension which generates:
   
   SELECT person_id,f_name,l_name, address,sighting_timestamp,count(distinct 
sighting_timestamp)
   from PersonSightings
   GROUP BY person_id,f_name,l_name, address,sighting_timestamp
   
   The main problem is the GROUP BY is very inefficient. I'd want SQL Like
   
   SELECT person_id,any(f_name),any(l_name), 
any(address),any(sighting_timestamp),count(distinct sighting_timestamp)
   from PersonSightings
   GROUP BY person_id
   
   To do this I need to turn those dimensions into metrics and add custom sql. 
But they should be treated as dimensions not metrics.
   
   So I proposed a way to put an aggregate around a dimension so superset can 
apply ANY or FIRST (whatever the database supports)
   
   In my use case I actually have a few dozen of these and using clickhouse 
which can be exponentially faster with smaller GROUP BYs
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to