[GitHub] [superset] ktmud commented on a change in pull request #13739: fix(sqla-query): order by aggregations in Presto and Hive

GitBox Tue, 23 Mar 2021 20:15:47 -0700


ktmud commented on a change in pull request #13739:
URL: https://github.com/apache/superset/pull/13739#discussion_r599915437




##########
File path: superset/connectors/sqla/models.py
##########
@@ -465,7 +467,7 @@ class SqlaTable(  # pylint: 
disable=too-many-public-methods,too-many-instance-at
     database_id = Column(Integer, ForeignKey("dbs.id"), nullable=False)
     fetch_values_predicate = Column(String(1000))
     owners = relationship(owner_class, secondary=sqlatable_user, 
backref="tables")
-    database = relationship(
+    database: Database = relationship(

Review comment:
       Bycatch: this hint allows quick jump to instance methods/attributes in 
the IDE.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -708,11 +694,10 @@ def health_check_message(self) -> Optional[str]:
     def data(self) -> Dict[str, Any]:
         data_ = super().data
         if self.type == "table":
-            grains = self.database.grains() or []
-            if grains:
-                grains = [(g.duration, g.name) for g in grains]
             data_["granularity_sqla"] = utils.choicify(self.dttm_cols)
-            data_["time_grain_sqla"] = grains
+            data_["time_grain_sqla"] = [
+                (g.duration, g.name) for g in self.database.grains() or []
+            ]

Review comment:
       Remove indirectness to fix typing.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -507,22 +509,6 @@ class SqlaTable(  # pylint: 
disable=too-many-public-methods,too-many-instance-at
         "MAX": sa.func.MAX,
     }
 
-    def make_sqla_column_compatible(
-        self, sqla_col: Column, label: Optional[str] = None
-    ) -> Column:
-        """Takes a sqlalchemy column object and adds label info if supported 
by engine.
-        :param sqla_col: sqlalchemy column instance
-        :param label: alias/label that column is expected to have
-        :return: either a sql alchemy column or label instance if supported by 
engine
-        """
-        label_expected = label or sqla_col.name
-        db_engine_spec = self.database.db_engine_spec
-        # add quotes to tables
-        if db_engine_spec.allows_alias_in_select:
-            label = db_engine_spec.make_label_compatible(label_expected)
-            sqla_col = sqla_col.label(label)
-        return sqla_col

Review comment:
       Moved to make sure built-in methods are at the top of the file.

##########
File path: tests/query_context_tests.py
##########
@@ -36,6 +37,17 @@
 from tests.fixtures.query_context import get_query_context
 
 
+def get_sql_text(payload: QueryContext) -> str:
+    payload["result_type"] = ChartDataResultType.QUERY.value
+    query_context = ChartDataQueryContextSchema().load(payload)
+    responses = query_context.get_payload()
+    assert len(responses) == 1
+    response = responses["queries"][0]
+    assert len(response) == 2
+    assert response["language"] == "sql"
+    return response["query"]

Review comment:
       Refactor to DRY.

##########
File path: tests/conftest.py
##########
@@ -73,13 +73,22 @@ def drop_from_schema(engine: Engine, schema_name: str):
 
 def setup_presto_if_needed():
     backend = app.config["SQLALCHEMY_EXAMPLES_URI"].split("://")[0]
+    database = get_example_database()
+    extra = database.get_extra()
+
     if backend == "presto":
         # decrease poll interval for tests
-        presto_poll_interval = app.config["PRESTO_POLL_INTERVAL"]
-        extra = f'{{"engine_params": {{"connect_args": {{"poll_interval": 
{presto_poll_interval}}}}}}}'
-        database = get_example_database()
-        database.extra = extra
-        db.session.commit()
+        extra = {
+            **extra,
+            "engine_params": {
+                "connect_args": {"poll_interval": 
app.config["PRESTO_POLL_INTERVAL"]}
+            },
+        }
+    else:
+        # remove `poll_interval` from databases that do not support it
+        extra = {**extra, "engine_params": {}}

Review comment:
       Bycatch, this fixes the case when you repetitively run independent tests 
with different backends as `poll_interval` param is only accepted by Presto 
connector.

##########
File path: superset/db_engine_specs/base.py
##########
@@ -368,20 +388,6 @@ def get_time_grain_expressions(cls) -> Dict[Optional[str], 
str]:
             time_grain_expressions.pop(key)
         return time_grain_expressions
 
-    @classmethod
-    def make_select_compatible(
-        cls, groupby_exprs: Dict[str, ColumnElement], select_exprs: 
List[ColumnElement]
-    ) -> List[ColumnElement]:
-        """
-        Some databases will just return the group-by field into the select, 
but don't
-        allow the group-by field to be put into the select list.
-
-        :param groupby_exprs: mapping between column name and column object
-        :param select_exprs: all columns in the select clause
-        :return: columns to be included in the final select clause
-        """
-        return select_exprs

Review comment:
       Bycatch, this methods is not useful anymore after 
https://github.com/apache/superset/pull/9954
   
   Unless users have custom engines that use this method, this clean up should 
be safe.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -882,6 +867,47 @@ def adhoc_metric_to_sqla(
 
         return self.make_sqla_column_compatible(sqla_metric, label)
 
+    def make_sqla_column_compatible(
+        self, sqla_col: Column, label: Optional[str] = None
+    ) -> Column:
+        """Takes a sqlalchemy column object and adds label info if supported 
by engine.
+        :param sqla_col: sqlalchemy column instance
+        :param label: alias/label that column is expected to have
+        :return: either a sql alchemy column or label instance if supported by 
engine
+        """
+        label_expected = label or sqla_col.name
+        db_engine_spec = self.database.db_engine_spec
+        # add quotes to tables
+        if db_engine_spec.allows_alias_in_select:
+            label = db_engine_spec.make_label_compatible(label_expected)
+            sqla_col = sqla_col.label(label)
+        return sqla_col
+
+    def make_orderby_compatible(self, query: Select) -> None:
+        """
+        If needed, make sure aliases for selected columns are not used in
+        `ORDER BY`.
+
+        In some databases (e.g. Presto), `ORDER BY` clause is not able to
+        automatically pick the source column if a `SELECT` clause alias is 
named
+        the same as a source column. In this case, we update the SELECT alias 
to
+        another name to avoid the conflict.
+        """
+        if self.database.db_engine_spec.allows_alias_to_source_column:
+            return
+
+        orderby_clause = str(query._order_by_clause)  # pylint: 
disable=protected-access
+
+        # Iterate through selected columns, if column alias appears in orderby
+        # use another `alias`. The final output columns will still use the
+        # original names, because they are updated by `labels_expected` after
+        # querying.
+        for col in query.inner_columns:
+            if isinstance(col, Label) and re.search(
+                f"\\b{col.name}\\b", orderby_clause
+            ):
+                col.name = f"{col.name}__"

Review comment:
       This is the easiest way to resolve column name conflicts. As it turns 
out, adding table prefix to source columns used in metrics as proposed 
[here](https://github.com/apache/superset/issues/13426#issuecomment-789400603) 
is not as easy as it seems, especially when we need to account for free form 
custom SQL.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] ktmud commented on a change in pull request #13739: fix(sqla-query): order by aggregations in Presto and Hive

Reply via email to