villebro commented on a change in pull request #18746:
URL: https://github.com/apache/superset/pull/18746#discussion_r807765821
##########
File path: superset/db_engine_specs/base.py
##########
@@ -649,6 +653,57 @@ def apply_limit_to_sql(
return sql
+ @classmethod
+ def apply_top_to_sql(
+ cls, sql: str, _limit: int, database: str = "Database", force: bool =
False
+ ) -> str:
+ """
+ Alters the SQL statement to apply a TOP clause
+ :param _limit: Maximum number of rows to be returned by the query
+ :param sql: SQL query
+ :param database: Database instance
+ :return: SQL query with top clause
+ """
+ sql_statement = sqlparse.format(sql, strip_comments=True)
+ query_limit: Optional[int] =
sql_parse._extract_top_from_query(sql_statement)
+ td_sel_keywords = {"SELECT","SEL"}
+ td_top_keywords = {"TOP","SAMPLE"}
Review comment:
`SEL` and `SAMPLE` seem to be specific to Teradata. We should probably
move these into `BaseEngineSpec` like this:
```python
class BaseEngineSpec:
...
# all keywords that should be treated as SELECTs
select_keywords = {"SELECT"}
# For engines that use TOP instead of LIMIT keyword,
list all keywords that should be treated similar top TOP keywords
top_keywords = {"TOP"}
```
Then for Teradata these could be set to
```python
select_keywords = {"SELECT","SEL"}
top_keywords = {"TOP","SAMPLE"}
```
and for MSSQL it would probably be ok to go with the defaults (I assume
`SEL` and `SAMPLE` aren't valid for MSSQL).
##########
File path: superset/sql_parse.py
##########
@@ -77,6 +77,29 @@ def _extract_limit_from_query(statement: TokenList) ->
Optional[int]:
return int(token.value)
return None
+def _extract_top_from_query(statement: TokenList) -> Optional[int]:
+ """
+ Extract top clause value from SQL statement.
+
+ :param statement: SQL statement
+ :return: top value extracted from query, None if no top value present in
statement
+ """
+
+ td_top_keyword = {"TOP","SAMPLE"}
+ str_statement = str(statement)
+ str_statement = str_statement.replace("\n", " ").replace("\r", "")
+ token = str_statement.rstrip().split(" ")
+ token = [part for part in token if part]
+ top = None
+ for i, _ in enumerate(token):
+ if token[i].upper() in td_top_keyword and len(token) - 1 > i:
+ try:
+ top = int(token[i + 1])
+ except ValueError:
+ top = None
+ break
+ return top
Review comment:
Here we'd need to pass in the `top_keywords` from the db engine spec to
make it possible to customize supported keywords per engine. Something like
this:
```suggestion
def _extract_top_from_query(statement: TokenList, top_keywords: Set[str]) ->
Optional[int]:
"""
Extract top clause value from SQL statement.
:param statement: SQL statement
:return: top value extracted from query, None if no top value present in
statement
"""
str_statement = str(statement)
str_statement = str_statement.replace("\n", " ").replace("\r", "")
token = str_statement.rstrip().split(" ")
token = [part for part in token if part]
top = None
for i, _ in enumerate(token):
if token[i].upper() in top_keywords and len(token) - 1 > i:
try:
top = int(token[i + 1])
except ValueError:
top = None
break
return top
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]