On Mon, Jul 6, 2015 at 5:50 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Jul 6, 2015 at 6:16 PM, Michael Nolan <htf...@gmail.com> wrote:
>
>> > But you can see it wont give correct results since (for example)
>> Monday's
>> > with no new users will not be counted in the average as 0.
>>
>> One way to handle this is to union your query with one that has a
>> generate_series (0,6) for the DOW column and nulls for the other
>> columns, then treat both that and your original query as a subquery
>> and do your averages, since nulls are not included in either count()
>> or average() aggregates:
>>
>> select dow, count(*), avg(some_column) from (
>> select extract ('dow' from some_date) as dow, some_number from some_table
>> union select generate_series(0,6) as dow, null as some_number) as x
>> group by 1 order by 1
>>
>
> ​I'm not seeing how this is at all useful.
>
> As you said, the average function ignores the null introduced by the union
> so the final answer with and without the union is the same.
>
> No matter how you work a "generate_series(0,6)" based query it will never
> be able to give a correct answer expect accidentally.  Each actual missing
> date contributes a ZERO to the numerator and a ONE to the denominator in
> the final division that constitutes the mean-average.  You must have those
> dates.
>
> In a series with four Mondays ( 4, 0, 0, 8 ) the average desired is 3, not
> 6 (or 4).  There is no way to make the denominator (number of Mondays) 4
> instead of 3 by using generate_series(0,6).
>
> David J.
>
>
>
>

Ah, you're right.  The problem is that avg() is going to treat missing data
as missing (of course.)  It will either be necessary to add in the missing
days as a zero value (but ONLY the missing days, requiring some kind of
'not exists' select, I suppose) or to 'roll your own' average function by
adding in the missing days as I did with a union in my earlier post.

The real problem is the DOW is not the field where the missing data is, it
is in the underlying date field.

I created a test dataset.  It has 1 day missing in a two-week period from
June 1st through June 14th (Sunday, June 7th).  Here's what the OP's SQL
generates:

Day       New Users
--- ----------------------
Sun     2.0000000000000000
Mon     4.5000000000000000
Tue     2.0000000000000000
Wed     4.5000000000000000
Thu 1.00000000000000000000
Fri     3.0000000000000000
Sat     3.0000000000000000

Here's the SQL to generate the missing day and do the average function by
hand:

select "Day", "New Users" from (
select dow, "Day", sum(total) / count(distinct created) as "New Users"from
(select extract(dow from created) as dow,
to_char(created,'Dy') as "Day", created, created2, total from

(select created, created as created2, count(*) as total from users
group by 1, 2
union
(select generate_series('2015-06-01 00:00'::timestamp,
'2015-06-14'::timestamp,'1 day')::date, null, 0) ) as x) as y
group by 1, 2) as z
order by dow


Day       New Users
--- ----------------------
Sun 1.00000000000000000000
Mon     4.5000000000000000
Tue     2.0000000000000000
Wed     4.5000000000000000
Thu 1.00000000000000000000
Fri     3.0000000000000000
Sat     3.0000000000000000

--
Mike Nolan
no...@tssi.com

Reply via email to