Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco

 ​I am fairly certain this does not give you the correct results.
 Specifically, the minimum value for each cDate is going to be 1 since
 count(*) counts NULLs.  count(u) should probably work.
 ​

 Yes you are right, I forgot to change COUNT(*) to COUNT(id), as you
mention COUNT(u.*) will also work. I just couldn't get the idea of
generating a sequence form 0 to 6 to work correctly. The approach I'm using
seems to give the correct results (with COUNT(u.id)).


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread David G. Johnston
Please follow list conventions and either respond inline or bottom-post.

On Mon, Jul 6, 2015 at 3:30 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 Paul, I'm sure I'm missing something but it seems like your approach will
 not work. It's because the LEFT OUTER JOIN is on the numeric day of the
 week. So if you had this query going over weeks or months of data wouldn't
 you have the same issue with the days that had no new users not being
 factored into the AVG?  I ended up doing something like this, which seems
 to work pretty well.

 WITH usersByDay AS (
 SELECT cDate, COUNT(*) AS total
 FROM (
 SELECT generate_series(
 {CALENDAR_INTERVAL.START}::DATE,
 {CALENDAR_INTERVAL.END}::DATE,
 interval '1 day')::DATE AS cDate
 ) AS c
 LEFT OUTER JOIN users u ON u.created::DATE = c.cDate
 GROUP BY cDate),


​I am fairly certain this does not give you the correct results.
Specifically, the minimum value for each cDate is going to be 1 since
count(*) counts NULLs.  count(u) should probably work.

SELECT dt, count(uid), count(*)
FROM generate_series('2015-01-01'::date, '2015-01-05'::date, '1
day'::interval) gs (dt)
LEFT JOIN (VALUES ('2015-01-01'::date, 1),
('2015-01-01',2),('2015-01-02',3)) users (dt, uid)
USING (dt)
GROUP BY dt
​;​

​David J.
​


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread David G. Johnston
On Mon, Jul 6, 2015 at 4:40 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 ​I am fairly certain this does not give you the correct results.
 Specifically, the minimum value for each cDate is going to be 1 since
 count(*) counts NULLs.  count(u) should probably work.
 ​

 Yes you are right, I forgot to change COUNT(*) to COUNT(id), as you
 mention COUNT(u.*) will also work. I just couldn't get the idea of
 generating a sequence form 0 to 6 to work correctly. The approach I'm using
 seems to give the correct results (with COUNT(u.id)).


​Correct. generate_series(0,6) won't work since there is no context as
whether it is supposed to cover a single week or multiple years or anything
in between.​

Any non-null column can be supplied to the count() function: count ignores
nulls.  In this case you want to ignore the placeholder null that you are
creating during the left join.  My original suggestion avoided these extra
placeholder values and instead forces you to process the master date range
and the user-by-date pieces separately and then substitute 0 for any master
date where the corresponding user-by-date was missing.  If performance were
important it may be worth testing both versions otherwise my guess is this
version is more readable (for you).

David J.


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

 I'm not sure how to create a result where I get the average number of
 new users per day of the week. My issues are that days that did not
 have any new users will not be factored into the average

This is a pretty common problem with time-series queries when there is 
sparse data. My go-to solution is to use generate_series---in your case 
from 0 to 6---then do a left join from there to your actual data.


Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco
Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
would solve this problem. Wouldn't I have to generate a series based on the
date range (by day) and then group by DOW _after_ that? Can you give me an
example of how I'd do it with a series based on 0 to 6?

On Mon, Jul 6, 2015 at 10:58 AM, Paul Jungwirth p...@illuminatedcomputing.com
 wrote:

  I'm not sure how to create a result where I get the average number of
  new users per day of the week. My issues are that days that did not
  have any new users will not be factored into the average

 This is a pretty common problem with time-series queries when there is
 sparse data. My go-to solution is to use generate_series---in your case
 from 0 to 6---then do a left join from there to your actual data.

 Paul





 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread David G. Johnston
On Mon, Jul 6, 2015 at 2:04 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 Wouldn't I have to generate a series based on the date range (by day) and
 then group by DOW _after_ that?


​You are correct.​

WITH userdays (dow, user_count) AS ( existing_query, more or less )
​, day_counts (dow, count_of_days) AS ( SELECT
generate_series(user_earliest_created_date, user_most_recent_created_date)
)​
SELECT dow, coalesce(user_count, 0) / count_of_days
FROM day_counts
LEFT JOIN userdays USING (dow)
​;​

David J.
​


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
would solve this problem. Wouldn't I have to generate a series based on
the date range (by day) and then group by DOW _after_ that? Can you give
me an example of how I'd do it with a series based on 0 to 6?


Looks like David Johnston beat me to it! :-) But this is what I had in mind:

SELECT  s.d AS dow,
COUNT(u.id) c
FROMgenerate_series(0, 6) s(d)
LEFT OUTER JOIN users u
ON  EXTRACT(dow FROM created) = s.d
GROUP BY dow
ORDER BY dow
;

You can also get human-readable DOW names by creating a 7-row CTE table 
and joining to it based on the numeric dow.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco
Paul, I'm sure I'm missing something but it seems like your approach will
not work. It's because the LEFT OUTER JOIN is on the numeric day of the
week. So if you had this query going over weeks or months of data wouldn't
you have the same issue with the days that had no new users not being
factored into the AVG?  I ended up doing something like this, which seems
to work pretty well.

WITH usersByDay AS (
SELECT cDate, COUNT(*) AS total
FROM (
SELECT generate_series(
{CALENDAR_INTERVAL.START}::DATE,
{CALENDAR_INTERVAL.END}::DATE,
interval '1 day')::DATE AS cDate
) AS c
LEFT OUTER JOIN users u ON u.created::DATE = c.cDate
GROUP BY cDate),
avgUsersByDOW AS (
SELECT extract('dow' FROM cDate) AS nDay,
 to_char(cDate,'Dy') AS Day,
ROUND(AVG(total), 2) AS New Users
FROM usersByDay
GROUP BY 1, 2
ORDER BY 1)
SELECT Day, New Users FROM avgUsersByDOW ORDER BY nDay




On Mon, Jul 6, 2015 at 11:30 AM, Paul Jungwirth p...@illuminatedcomputing.com
 wrote:

 Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
 would solve this problem. Wouldn't I have to generate a series based on
 the date range (by day) and then group by DOW _after_ that? Can you give
 me an example of how I'd do it with a series based on 0 to 6?


 Looks like David Johnston beat me to it! :-) But this is what I had in
 mind:

 SELECT  s.d AS dow,
 COUNT(u.id) c
 FROMgenerate_series(0, 6) s(d)
 LEFT OUTER JOIN users u
 ON  EXTRACT(dow FROM created) = s.d
 GROUP BY dow
 ORDER BY dow
 ;

 You can also get human-readable DOW names by creating a 7-row CTE table
 and joining to it based on the numeric dow.

 Paul




Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread David G. Johnston
On Mon, Jul 6, 2015 at 6:16 PM, Michael Nolan htf...@gmail.com wrote:

  But you can see it wont give correct results since (for example) Monday's
  with no new users will not be counted in the average as 0.

 One way to handle this is to union your query with one that has a
 generate_series (0,6) for the DOW column and nulls for the other
 columns, then treat both that and your original query as a subquery
 and do your averages, since nulls are not included in either count()
 or average() aggregates:

 select dow, count(*), avg(some_column) from (
 select extract ('dow' from some_date) as dow, some_number from some_table
 union select generate_series(0,6) as dow, null as some_number) as x
 group by 1 order by 1


​I'm not seeing how this is at all useful.

As you said, the average function ignores the null introduced by the union
so the final answer with and without the union is the same.

No matter how you work a generate_series(0,6) based query it will never
be able to give a correct answer expect accidentally.  Each actual missing
date contributes a ZERO to the numerator and a ONE to the denominator in
the final division that constitutes the mean-average.  You must have those
dates.

In a series with four Mondays ( 4, 0, 0, 8 ) the average desired is 3, not
6 (or 4).  There is no way to make the denominator (number of Mondays) 4
instead of 3 by using generate_series(0,6).

David J.


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Michael Nolan
On 7/6/15, Robert DiFalco robert.difa...@gmail.com wrote:
 I'm not sure how to create a result where I get the average number of new
 users per day of the week. My issues are that days that did not have any
 new users will not be factored into the average, giving an overinflated
 result.

 This is what I started with:

 WITH userdays AS
   (SELECT u.created::DATE AS created,
   to_char(u.created,'Dy') AS d,
   COUNT(*) AS total
FROM users u
GROUP BY 1,2),
 userdays_avg AS
   (SELECT extract('dow'
   FROM created) AS nDay,
   d AS Day,
   AVG(total) AS New Users
FROM userdays
GROUP BY 1,2
ORDER BY 1)
 SELECT Day, New Users
 FROM userdays_avg
 ORDER BY nDay;


 But you can see it wont give correct results since (for example) Monday's
 with no new users will not be counted in the average as 0.

One way to handle this is to union your query with one that has a
generate_series (0,6) for the DOW column and nulls for the other
columns, then treat both that and your original query as a subquery
and do your averages, since nulls are not included in either count()
or average() aggregates:

select dow, count(*), avg(some_column) from (
select extract ('dow' from some_date) as dow, some_number from some_table
union select generate_series(0,6) as dow, null as some_number) as x
group by 1 order by 1

--
Mike Nolan
no...@tssi.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Michael Nolan
On Mon, Jul 6, 2015 at 5:50 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Mon, Jul 6, 2015 at 6:16 PM, Michael Nolan htf...@gmail.com wrote:

  But you can see it wont give correct results since (for example)
 Monday's
  with no new users will not be counted in the average as 0.

 One way to handle this is to union your query with one that has a
 generate_series (0,6) for the DOW column and nulls for the other
 columns, then treat both that and your original query as a subquery
 and do your averages, since nulls are not included in either count()
 or average() aggregates:

 select dow, count(*), avg(some_column) from (
 select extract ('dow' from some_date) as dow, some_number from some_table
 union select generate_series(0,6) as dow, null as some_number) as x
 group by 1 order by 1


 ​I'm not seeing how this is at all useful.

 As you said, the average function ignores the null introduced by the union
 so the final answer with and without the union is the same.

 No matter how you work a generate_series(0,6) based query it will never
 be able to give a correct answer expect accidentally.  Each actual missing
 date contributes a ZERO to the numerator and a ONE to the denominator in
 the final division that constitutes the mean-average.  You must have those
 dates.

 In a series with four Mondays ( 4, 0, 0, 8 ) the average desired is 3, not
 6 (or 4).  There is no way to make the denominator (number of Mondays) 4
 instead of 3 by using generate_series(0,6).

 David J.





Ah, you're right.  The problem is that avg() is going to treat missing data
as missing (of course.)  It will either be necessary to add in the missing
days as a zero value (but ONLY the missing days, requiring some kind of
'not exists' select, I suppose) or to 'roll your own' average function by
adding in the missing days as I did with a union in my earlier post.

The real problem is the DOW is not the field where the missing data is, it
is in the underlying date field.

I created a test dataset.  It has 1 day missing in a two-week period from
June 1st through June 14th (Sunday, June 7th).  Here's what the OP's SQL
generates:

Day   New Users
--- --
Sun 2.
Mon 4.5000
Tue 2.
Wed 4.5000
Thu 1.
Fri 3.
Sat 3.

Here's the SQL to generate the missing day and do the average function by
hand:

select Day, New Users from (
select dow, Day, sum(total) / count(distinct created) as New Usersfrom
(select extract(dow from created) as dow,
to_char(created,'Dy') as Day, created, created2, total from

(select created, created as created2, count(*) as total from users
group by 1, 2
union
(select generate_series('2015-06-01 00:00'::timestamp,
'2015-06-14'::timestamp,'1 day')::date, null, 0) ) as x) as y
group by 1, 2) as z
order by dow


Day   New Users
--- --
Sun 1.
Mon 4.5000
Tue 2.
Wed 4.5000
Thu 1.
Fri 3.
Sat 3.

--
Mike Nolan
no...@tssi.com


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Michael Nolan
Here's a minor refinement that doesn't require knowing the range of dates
in the users table:

(select created, created as created2, count(*) as total from users
group by 1, 2
union
(select generate_series(
(select min(created)::timestamp from users),
(select max(created)::timestamp from users),
'1 day')::date, null, 0) ) as x) as y
group by 1, 2) as z
order by dow

Day   New Users
--- --
Sun 1.
Mon 4.5000
Tue 2.
Wed 4.5000
Thu 1.
Fri 3.
Sat 3.

--
Mike Nolan