Github user decibel commented on the pull request:

    https://github.com/apache/incubator-madlib/pull/41#issuecomment-216642133
  
    I tried reviewing this patch, but I'm having great difficulty because I 
don't really know the details of what it's trying to accomplish. Maybe that's 
because I'm a database guy and not a statistician, but there's no documentation 
and very little in the way of comments.
    
    What would be very helpful to me would be docs/comments about what the 
module is trying to provide, as well as docstrings for any functions whose 
purpose is non-obvious. There also needs to be explanation about the handling 
of NULLs. If the normal aggregate behavior of ignoring NULLs isn't desired, I 
think something like avg(coalesce({observed_col}, 0)) is better than re-rolling 
the aggregate functions by hand.
    
    Making multiple independent queries against {table_in} will be very, very 
costly if that table is large. Additionally, creating lots of temp tables 
bloats the catalog. It's much better to write a single query that produces your 
output.
    
    Finally, please take a look at the built-in statistics aggregate functions 
(http://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE),
 and see if any of them will do what's necessary. If not then if I knew what 
the deficiencies were we could propose addressing them with the Postgres 
community.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to