villebro commented on a change in pull request #18746:
URL: https://github.com/apache/superset/pull/18746#discussion_r807996512



##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +658,70 @@ def apply_limit_to_sql(
 
         return sql
 
+    @classmethod
+    def apply_top_to_sql(
+        cls, sql: str, _limit: int, database: str = "Database", force: bool = 
False
+    ) -> str:
+        """
+        Alters the SQL statement to apply a TOP clause
+        :param _limit: Maximum number of rows to be returned by the query
+        :param sql: SQL query
+        :param database: Database instance
+        :return: SQL query with top clause
+        """
+
+        td_sel_keywords = cls.select_keywords
+        td_top_keywords = cls.top_keywords
+        cte = None
+        sql_remainder = None
+        sql_statement = sqlparse.format(sql, strip_comments=True)
+        query_limit: Optional[int] = 
sql_parse._extract_top_from_query(sql_statement,
+                                                                       
td_top_keywords)
+        if not _limit:
+            final_limit = query_limit
+        elif int(query_limit or 0) < _limit and query_limit is not None:
+            final_limit = query_limit
+        else:
+            final_limit = _limit
+        if not cls.allows_cte_in_subquery:
+            cte, sql_remainder = 
sql_parse.get_cte_reminder_query(sql_statement)
+        if cte:
+            str_statement = str(sql_remainder)
+            cte = cte + "\n"
+        else:
+            cte = ""
+            str_statement = str(sql)
+        str_statement = str_statement.replace("\n", " ").replace("\r", "")
+
+        tokens = str_statement.rstrip().split(" ")
+        tokens = [token for token in tokens if token]
+        if cls.top_not_in_sql(str_statement, td_top_keywords):

Review comment:
       Since `top_not_in_sql` is a class method, we probably don't need to pass 
`cls.top_keywords` to it:
   ```suggestion
           if cls.top_not_in_sql(str_statement):
   ```

##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +658,70 @@ def apply_limit_to_sql(
 
         return sql
 
+    @classmethod
+    def apply_top_to_sql(
+        cls, sql: str, _limit: int, database: str = "Database", force: bool = 
False
+    ) -> str:
+        """
+        Alters the SQL statement to apply a TOP clause
+        :param _limit: Maximum number of rows to be returned by the query
+        :param sql: SQL query
+        :param database: Database instance
+        :return: SQL query with top clause
+        """
+
+        td_sel_keywords = cls.select_keywords
+        td_top_keywords = cls.top_keywords

Review comment:
       Not sure why we need to do this - to me it appears originally the `td_` 
prefix was just meant to indicate that they apply only to Teradata, so no need 
to keep that prefix in the base class.
   ```suggestion
   ```

##########
File path: superset/db_engine_specs/base.py
##########
@@ -697,6 +770,7 @@ def get_cte_query(cls, sql: str) -> Optional[str]:
 
         return None
 
+

Review comment:
       nit: This will also cause a linting error
   ```suggestion
   ```

##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +658,70 @@ def apply_limit_to_sql(
 
         return sql
 
+    @classmethod
+    def apply_top_to_sql(
+        cls, sql: str, _limit: int, database: str = "Database", force: bool = 
False
+    ) -> str:
+        """
+        Alters the SQL statement to apply a TOP clause
+        :param _limit: Maximum number of rows to be returned by the query
+        :param sql: SQL query
+        :param database: Database instance
+        :return: SQL query with top clause
+        """
+
+        td_sel_keywords = cls.select_keywords
+        td_top_keywords = cls.top_keywords
+        cte = None
+        sql_remainder = None
+        sql_statement = sqlparse.format(sql, strip_comments=True)
+        query_limit: Optional[int] = 
sql_parse._extract_top_from_query(sql_statement,
+                                                                       
td_top_keywords)
+        if not _limit:
+            final_limit = query_limit
+        elif int(query_limit or 0) < _limit and query_limit is not None:
+            final_limit = query_limit
+        else:
+            final_limit = _limit
+        if not cls.allows_cte_in_subquery:
+            cte, sql_remainder = 
sql_parse.get_cte_reminder_query(sql_statement)
+        if cte:
+            str_statement = str(sql_remainder)
+            cte = cte + "\n"
+        else:
+            cte = ""
+            str_statement = str(sql)
+        str_statement = str_statement.replace("\n", " ").replace("\r", "")
+
+        tokens = str_statement.rstrip().split(" ")
+        tokens = [token for token in tokens if token]
+        if cls.top_not_in_sql(str_statement, td_top_keywords):
+            selects = [i for i, word in enumerate(tokens) if
+                       word.upper() in td_sel_keywords]
+            first_select = selects[0]
+            tokens.insert(first_select + 1, "TOP")
+            tokens.insert(first_select + 2, str(final_limit))
+
+        next_is_limit_token = False
+        new_tokens = []
+
+        for token in tokens:
+            if token in td_top_keywords:
+                next_is_limit_token = True
+            elif next_is_limit_token:
+                if token.isdigit():
+                    token = str(final_limit)
+                    next_is_limit_token = False
+            new_tokens.append(token)
+        sql = " ".join(new_tokens)
+        return cte + sql
+
+    def top_not_in_sql(sql: str, top_words: Set[str]) -> bool:
+        for top_word in top_words:
+            if top_word.upper() in sql.upper():
+                return False
+        return True

Review comment:
       Same change here to reflect the comment above:
   ```suggestion
       def top_not_in_sql(sql: str) -> bool:
           for top_keyword in cls.top_keywords:
               if top_keyword.upper() in sql.upper():
                   return False
           return True
   ```

##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +658,70 @@ def apply_limit_to_sql(
 
         return sql
 
+    @classmethod
+    def apply_top_to_sql(
+        cls, sql: str, _limit: int, database: str = "Database", force: bool = 
False
+    ) -> str:
+        """
+        Alters the SQL statement to apply a TOP clause
+        :param _limit: Maximum number of rows to be returned by the query
+        :param sql: SQL query
+        :param database: Database instance
+        :return: SQL query with top clause
+        """
+
+        td_sel_keywords = cls.select_keywords
+        td_top_keywords = cls.top_keywords
+        cte = None
+        sql_remainder = None
+        sql_statement = sqlparse.format(sql, strip_comments=True)
+        query_limit: Optional[int] = 
sql_parse._extract_top_from_query(sql_statement,
+                                                                       
td_top_keywords)

Review comment:
       Here we can just reference the class property:
   ```suggestion
           query_limit: Optional[int] = 
sql_parse._extract_top_from_query(sql_statement,
                                                                          
cls.top_keywords)
   ```

##########
File path: superset/db_engine_specs/mssql.py
##########
@@ -48,6 +48,9 @@ class MssqlEngineSpec(BaseEngineSpec):
     limit_method = LimitMethod.WRAP_SQL
     max_column_name_length = 128
     allows_cte_in_subquery = False
+    allow_limit_clause = False
+    sel_keywords = {"SELECT"}
+    top_keywords = {"TOP"}

Review comment:
       wasn't this called `select_keywords` in `BaseEngineSpec`? Also, no need 
to override if the property is unchanged in the extending class.

##########
File path: superset/db_engine_specs/base.py
##########
@@ -638,8 +647,8 @@ def apply_limit_to_sql(
             sql = sql.strip("\t\n ;")
             qry = (
                 select("*")
-                .select_from(TextAsFrom(text(sql), ["*"]).alias("inner_qry"))
-                .limit(limit)
+                    .select_from(TextAsFrom(text(sql), 
["*"]).alias("inner_qry"))
+                    .limit(limit)

Review comment:
       I expect these changes to cause linting errors

##########
File path: superset/db_engine_specs/base.py
##########
@@ -300,6 +299,16 @@ class BaseEngineSpec:  # pylint: 
disable=too-many-public-methods
     # If True, then it will allow  in subquery ,
     # if False it will allow as regular CTE
     allows_cte_in_subquery = True
+    # Whether allow LIMIT clause in the SQL
+    # If True, then the database engine is allowed for LIMIT clause
+    # If False, then the database engine is allowed for TOP clause
+    allow_limit_clause = True
+    # This list will give the default list of keywords for select statements
+    # to consider in the limit/top SQl parsing
+    select_keywords = {"SELECT"}
+    # This list will give the default list of keywords for data limit 
statements
+    # to consider in the limit/top SQl parsing
+    top_keywords = {"LIMIT"}

Review comment:
       Hmm, if this will be used for all engines, even those using the `LIMIT` 
syntax, then the property should probably be called `limit_keywords` to keep it 
as aligned with ANSI SQL as possible. But I'd probably keep this specific to 
`TOP` notation and leave it undefined for `BaseEngineSpec`. Something like
   `top_keywords: Set[str] = None` to underscore that this is not relevant for 
the majority of engines. The same might also apply to `select_keywords`, as it 
appears we only reference that property in the `apply_top_to_sql` class method, 
but not in `apply_limit_to_sql`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to