Github user rashmi815 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/138#discussion_r120770959
--- Diff: src/ports/postgres/modules/summary/summary.py_in ---
@@ -7,84 +7,73 @@
"""
import plpy
from time import time
-from utilities.utilities import __mad_version
+
+from utilities.control import MinWarning
from Summarizer import Summarizer
-version_wrapper = __mad_version()
-_get_vector = version_wrapper.select_vecfunc()
+
def summary(schema_madlib, source_table, output_table, target_cols,
grouping_cols,
- get_distinct, get_quartiles, ntile_array, how_many_mfv, get_estimates):
+ get_distinct, get_quartiles, ntile_array, how_many_mfv,
+ get_estimates, n_cols_per_run):
"""
- Main summary function that is called by SQL to execute summary
+ Main summary function that is called by SQL to compute summary
statistics on a table.
- @param schema_madlib Madlib Schema namespace
- @param source_table Name of input table
- @param output_table Name of output table
- @param target_cols Names of specific columns for which to
get summary
- @param grouping_cols Names of columns on which to group-by
- (no summary is provided for these
columns)
- @param get_distinct Should summary include distinct count
- @param get_quartiles Should summary include quartile
information
- @param ntile_array Array for quantiles to include in
summary
- (each element should be in [0, 1])
- @param how_many_mfv How many frequent values to output?
- @param get_estimates Should the summmary information be
estimated or exact?
+ @param schema_madlib Madlib Schema namespace
+ @param source_table Name of input table
+ @param output_table Name of output table
+ @param target_cols Names of specific columns for which to get
summary
+ @param grouping_cols Names of columns on which to group-by
+ (no summary is provided for these
columns)
+ @param get_distinct Should summary include distinct count
+ @param get_quartiles Should summary include quartile information
+ @param ntile_array Array for quantiles to include in summary
+ (each element should be in [0, 1])
+ @param how_many_mfv How many frequent values to output?
+ @param get_estimates Should the summmary information be
estimated or exact?
--- End diff --
Should there be an entry here for n_cols_per_run?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---