Github user rashmi815 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/138#discussion_r120751194
--- Diff: src/ports/postgres/modules/summary/summary.sql_in ---
@@ -162,31 +163,52 @@ following columns:
<td>Array containing the frequency count for each of the most
frequent values. </td>
</tr>
</table></dd>
+
<dt>target_columns (optional)</dt>
<dd>TEXT, default NULL. A comma-separated list of columns to summarize. If
NULL, summaries are produced for all columns.</dd>
+
<dt>grouping_cols (optional)</dt>
-<dd>TEXT, default: null. A comma-separated list of columns on which to
+<dd>TEXT, default: null. A comma-separated list of columns on which to
group results. If NULL, summaries are produced on the complete table.</dd>
-@note Please note that summary statistics are calculated for each grouping
+@note Please note that summary statistics are calculated for each grouping
column independently. That is, grouping columns are not combined together
-as in the regular PostgreSQL style GROUP BY directive. (This was done
+as in the regular PostgreSQL style GROUP BY directive. (This was done
to reduce long run time and huge output table size which would otherwise
-result in the case of large input tables with a lot of grouping_cols and
+result in the case of large input tables with a lot of grouping_cols and
target_cols specified.)
+
<dt>get_distinct (optional)</dt>
<dd>BOOLEAN, default TRUE. If true, distinct values are counted.</dd>
+
<dt>get_quartiles (optional)</dt>
<dd>BOOLEAN, default TRUE. If TRUE, quartiles are computed.</dd>
+
<dt>ntile_array (optional)</dt>
<dd>FLOAT8[], default NULL. An array of quantile values to compute. If
NULL, quantile values are not computed.</dd>
-@note Quartile and quantile functions are not available for PostgreSQL 9.3
or
-lower. If you are using PostgreSQL 9.3 or lower, the output table will
not
-contain these values, even if you set 'get_quartiles' = TRUE or
+@note Quartile and quantile functions are not available for PostgreSQL 9.3
or
+lower. If you are using PostgreSQL 9.3 or lower, the output table will not
+contain these values, even if you set 'get_quartiles' = TRUE or
provide an array of quantile values for the parameter 'ntile_array'.
+
<dt>how_many_mfv (optional)</dt>
<dd>INTEGER, default: 10. The number of most-frequent-values to
compute.</dd>
+
<dt>get_estimates (optional)</dt>
<dd>BOOLEAN, default TRUE. If TRUE, estimated values are produced for
distinct values and most frequent values. If FALSE, exact values are calculated
(may take longer to run depending on data size).</dd>
+
+<dt>n_cols_per_run (optional)</dt>
+<dd>INTEGER, default: 15. The number of columns to collect summary
statistics in
+one pass of the data.
+This parameter determines the number of passes through the data. For e.g.,
+with a total of 40 columns to summarize and 'n_cols_per_run = 15', there
will
--- End diff --
Needs the word "be" at the end of the line "with a total of 40 columns to
summarize and 'n_cols_per_run = 15', there will "
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---