Repository: madlib Updated Branches: refs/heads/master dd906f7c7 -> 47e357446
Correlation: Fix bug with international characters JIRA:MADLIB-1186 Additional Author: Nandish Jayaram <[email protected]> If the column name of an independent variable used in madlib.correlation(...) has quotes in it, then the query fails due to a regular string concat used for creating an intermediate column name that reflects the average of the column. This commit uses add_postfix() to create that column name instead. Originally, the new column name was `avg_{column_name}`, that is replaced with add_postfix(column_name, '_avg'). The prefix `avg_` is changed to suffix `_avg`. This is only an intermediate column, and not shown as an output, hence ignoring the semantics of the final string name. Closes #214 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/47e35744 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/47e35744 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/47e35744 Branch: refs/heads/master Commit: 47e357446e39be6662eb472b1b523280c831b360 Parents: dd906f7 Author: Swati Soni <[email protected]> Authored: Mon Dec 11 14:09:46 2017 -0800 Committer: Nandish Jayaram <[email protected]> Committed: Tue Dec 12 12:38:37 2017 -0800 ---------------------------------------------------------------------- src/ports/postgres/modules/stats/correlation.py_in | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/madlib/blob/47e35744/src/ports/postgres/modules/stats/correlation.py_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/stats/correlation.py_in b/src/ports/postgres/modules/stats/correlation.py_in index f658b48..d524eb1 100644 --- a/src/ports/postgres/modules/stats/correlation.py_in +++ b/src/ports/postgres/modules/stats/correlation.py_in @@ -179,9 +179,11 @@ def _populate_output_table(schema_madlib, source_table, output_table, function_name = "Correlation" agg_str = "{0}.correlation_agg(x, mean)".format(schema_madlib) - cols = ','.join(["coalesce({0}, avg_{0})".format(col) for col in col_names]) - avgs = ','.join(["avg({0}) AS avg_{0}".format(col) for col in col_names]) - avg_array = ','.join(["avg_{0}".format(col) for col in col_names]) + cols = ','.join(["coalesce({0}, {1})".format(col, add_postfix(col, "_avg")) + for col in col_names]) + avgs = ','.join(["avg({0}) AS {1}".format(col, add_postfix(col, "_avg")) + for col in col_names]) + avg_array = ','.join([str(add_postfix(col, "_avg")) for col in col_names]) # actual computation sql1 = """
