[GitHub] [superset] villebro commented on a change in pull request #18746: (WIP) Fix for SQL-TOP/LIMIT Clause Apply based on database engines

GitBox Wed, 16 Feb 2022 02:27:56 -0800


villebro commented on a change in pull request #18746:
URL: https://github.com/apache/superset/pull/18746#discussion_r807765821




##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +653,57 @@ def apply_limit_to_sql(
 
         return sql
 
+    @classmethod
+    def apply_top_to_sql(
+        cls, sql: str, _limit: int, database: str = "Database", force: bool = 
False
+    ) -> str:
+        """
+        Alters the SQL statement to apply a TOP clause
+        :param _limit: Maximum number of rows to be returned by the query
+        :param sql: SQL query
+        :param database: Database instance
+        :return: SQL query with top clause
+        """
+        sql_statement = sqlparse.format(sql, strip_comments=True)
+        query_limit: Optional[int] = 
sql_parse._extract_top_from_query(sql_statement)
+        td_sel_keywords = {"SELECT","SEL"}
+        td_top_keywords = {"TOP","SAMPLE"}

Review comment:
       `SEL` and `SAMPLE` seem to be specific to Teradata. We should probably 
move these into `BaseEngineSpec` like this:
   
   ```python
   class BaseEngineSpec:
       ...
       # all keywords that should be treated as SELECTs
       select_keywords = {"SELECT"}
       # For engines that use TOP instead of LIMIT keyword, 
       list all keywords that should be treated similar top TOP keywords
       top_keywords = {"TOP"}
   ```
   
   Then for Teradata these could be set to
   ```python
       select_keywords = {"SELECT","SEL"}
       top_keywords = {"TOP","SAMPLE"}
   ```
   
   and for MSSQL it would probably be ok to go with the defaults (I assume 
`SEL` and `SAMPLE` aren't valid for MSSQL).

##########
File path: superset/sql_parse.py
##########
@@ -77,6 +77,29 @@ def _extract_limit_from_query(statement: TokenList) -> 
Optional[int]:
                 return int(token.value)
     return None
 
+def _extract_top_from_query(statement: TokenList) -> Optional[int]:
+    """
+    Extract top clause value from SQL statement.
+
+    :param statement: SQL statement
+    :return: top value extracted from query, None if no top value present in 
statement
+    """
+
+    td_top_keyword = {"TOP","SAMPLE"}
+    str_statement = str(statement)
+    str_statement = str_statement.replace("\n", " ").replace("\r", "")
+    token = str_statement.rstrip().split(" ")
+    token = [part for part in token if part]
+    top = None
+    for i, _ in enumerate(token):
+        if token[i].upper() in td_top_keyword and len(token) - 1 > i:
+            try:
+                top = int(token[i + 1])
+            except ValueError:
+                top = None
+            break
+    return top

Review comment:
       Here we'd need to pass in the `top_keywords` from the db engine spec to 
make it possible to customize supported keywords per engine. Something like 
this:
   ```suggestion
   def _extract_top_from_query(statement: TokenList, top_keywords: Set[str]) -> 
Optional[int]:
       """
       Extract top clause value from SQL statement.
   
       :param statement: SQL statement
       :return: top value extracted from query, None if no top value present in 
statement
       """
   
       str_statement = str(statement)
       str_statement = str_statement.replace("\n", " ").replace("\r", "")
       token = str_statement.rstrip().split(" ")
       token = [part for part in token if part]
       top = None
       for i, _ in enumerate(token):
           if token[i].upper() in top_keywords and len(token) - 1 > i:
               try:
                   top = int(token[i + 1])
               except ValueError:
                   top = None
               break
       return top
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] villebro commented on a change in pull request #18746: (WIP) Fix for SQL-TOP/LIMIT Clause Apply based on database engines

Reply via email to