[GitHub] madlib pull request #220: Add more stats to summary function

fmcquillan99 Fri, 22 Dec 2017 16:56:28 -0800

Github user fmcquillan99 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/220#discussion_r158572631
  
    --- Diff: src/ports/postgres/modules/summary/Summarizer.py_in ---
    @@ -199,6 +200,22 @@ class Summarizer:
             args['max_columns'] = ','.join([minmax_type('max', c) for c in 
cols])
     
             args['ntile_columns'] = "array_to_string(array[NULL], ',')"
    +
    +        args['positive_columns'] = ','.join(["sum(case when {0} > 0 \
    +                                           then 1 else 0 
end)".format(c['attname'])
    +                                          if c['typname'] in numeric_types
    +                                          else 'NULL' for c in cols])
    +
    +        args["negative_columns"] = ','.join(["sum(case when {0} < 0 \
    +                                           then 1 else 0 
end)".format(c['attname'])
    +                                          if c['typname'] in numeric_types
    +                                          else 'NULL' for c in cols])
    +
    +        args["zero_columns"] = ','.join(["sum(case when {0} = 0 \
    --- End diff --
    
    Orhan when you say "comparison" between FLOATs, then yes, we do need to 
consider rounding.    
    
    But if we are checking *explicit* equality to 0 in summary() function, then 
we should not worry about rounding, it seems to me.  So I am changing my 
opinion here, and suggest that this PR that does checking = 0 is fine.  We can 
put it in the user docs that this is what we are doing.   
    
    Others please comment if you think this is an OK approach.

---

[GitHub] madlib pull request #220: Add more stats to summary function

Reply via email to