On Mon, Jul 6, 2015 at 5:50 PM, David G. Johnston < david.g.johns...@gmail.com> wrote:
> On Mon, Jul 6, 2015 at 6:16 PM, Michael Nolan <htf...@gmail.com> wrote: > >> > But you can see it wont give correct results since (for example) >> Monday's >> > with no new users will not be counted in the average as 0. >> >> One way to handle this is to union your query with one that has a >> generate_series (0,6) for the DOW column and nulls for the other >> columns, then treat both that and your original query as a subquery >> and do your averages, since nulls are not included in either count() >> or average() aggregates: >> >> select dow, count(*), avg(some_column) from ( >> select extract ('dow' from some_date) as dow, some_number from some_table >> union select generate_series(0,6) as dow, null as some_number) as x >> group by 1 order by 1 >> > > I'm not seeing how this is at all useful. > > As you said, the average function ignores the null introduced by the union > so the final answer with and without the union is the same. > > No matter how you work a "generate_series(0,6)" based query it will never > be able to give a correct answer expect accidentally. Each actual missing > date contributes a ZERO to the numerator and a ONE to the denominator in > the final division that constitutes the mean-average. You must have those > dates. > > In a series with four Mondays ( 4, 0, 0, 8 ) the average desired is 3, not > 6 (or 4). There is no way to make the denominator (number of Mondays) 4 > instead of 3 by using generate_series(0,6). > > David J. > > > > Ah, you're right. The problem is that avg() is going to treat missing data as missing (of course.) It will either be necessary to add in the missing days as a zero value (but ONLY the missing days, requiring some kind of 'not exists' select, I suppose) or to 'roll your own' average function by adding in the missing days as I did with a union in my earlier post. The real problem is the DOW is not the field where the missing data is, it is in the underlying date field. I created a test dataset. It has 1 day missing in a two-week period from June 1st through June 14th (Sunday, June 7th). Here's what the OP's SQL generates: Day New Users --- ---------------------- Sun 2.0000000000000000 Mon 4.5000000000000000 Tue 2.0000000000000000 Wed 4.5000000000000000 Thu 1.00000000000000000000 Fri 3.0000000000000000 Sat 3.0000000000000000 Here's the SQL to generate the missing day and do the average function by hand: select "Day", "New Users" from ( select dow, "Day", sum(total) / count(distinct created) as "New Users"from (select extract(dow from created) as dow, to_char(created,'Dy') as "Day", created, created2, total from (select created, created as created2, count(*) as total from users group by 1, 2 union (select generate_series('2015-06-01 00:00'::timestamp, '2015-06-14'::timestamp,'1 day')::date, null, 0) ) as x) as y group by 1, 2) as z order by dow Day New Users --- ---------------------- Sun 1.00000000000000000000 Mon 4.5000000000000000 Tue 2.0000000000000000 Wed 4.5000000000000000 Thu 1.00000000000000000000 Fri 3.0000000000000000 Sat 3.0000000000000000 -- Mike Nolan no...@tssi.com