betodealmeida commented on a change in pull request #19242:
URL: https://github.com/apache/superset/pull/19242#discussion_r829461122



##########
File path: superset/config.py
##########
@@ -443,6 +443,7 @@ def _try_json_readsha(filepath: str, length: int) -> 
Optional[str]:
     "ALLOW_FULL_CSV_EXPORT": False,
     "UX_BETA": False,
     "GENERIC_CHART_AXES": False,
+    "ALLOW_ADHOC_SUBQUERY": True,

Review comment:
       I agree, even though this is technically a breaking change it prevents 
people from bypassing security rules.

##########
File path: superset/connectors/sqla/utils.py
##########
@@ -119,3 +120,22 @@ def get_virtual_table_metadata(dataset: "SqlaTable") -> 
List[Dict[str, str]]:
     except Exception as ex:
         raise SupersetGenericDBErrorException(message=str(ex)) from ex
     return cols
+
+
+def allow_adhoc_subquery(raw_sql: str) -> bool:

Review comment:
       Ah, I was confused when I saw the use of this function on the file above 
(`superset/connectors/sqla/models.py`), I expected this to return `True` or 
`False`, but it never returns `False`.
   
   I think it might be clearer to rename this function and change its signature 
to something like:
   
   ```python
   def validate_adhoc_subquery(raw_sql: str) -> None
   ```
   
   Then in the file above it will be easier to understand what's going on.
   
   Also, can you add a docstring to this function?

##########
File path: superset/connectors/sqla/utils.py
##########
@@ -119,3 +120,22 @@ def get_virtual_table_metadata(dataset: "SqlaTable") -> 
List[Dict[str, str]]:
     except Exception as ex:
         raise SupersetGenericDBErrorException(message=str(ex)) from ex
     return cols
+
+
+def allow_adhoc_subquery(raw_sql: str) -> bool:
+    # pylint: disable=import-outside-toplevel
+    from superset import is_feature_enabled
+
+    if is_feature_enabled("ALLOW_ADHOC_SUBQUERY"):
+        return True
+
+    statement = sqlparse.parse(raw_sql)[0]
+    if has_table_query(statement):

Review comment:
       I think it's safe to assume the ad-hoc expression will have a single 
statement, but we might want to err on the side of caution just in case and do:
   
   ```python
   for statement in sqlparse.parse(raw_sql):
       if has_table_query(statement):
           raise ...
   ```

##########
File path: superset/connectors/sqla/models.py
##########
@@ -885,7 +886,8 @@ def adhoc_metric_to_sqla(
         elif expression_type == utils.AdhocMetricExpressionType.SQL:
             tp = self.get_template_processor()
             expression = tp.process_template(cast(str, 
metric["sqlExpression"]))
-            sqla_metric = literal_column(expression)
+            if allow_adhoc_subquery(expression):

Review comment:
       If this condition is not true then `sqla_metric` will be undefined in 
line 894.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -1165,7 +1169,7 @@ def get_sqla_query(  # pylint: 
disable=too-many-arguments,too-many-locals,too-ma
                     # if groupby field equals a selected column
                     elif selected in columns_by_name:
                         outer = columns_by_name[selected].get_sqla_col()
-                    else:
+                    elif allow_adhoc_subquery(selected):

Review comment:
       Same here, we need a fallback value for `outer` when 
`allow_adhoc_subquery(selected)` is false.

##########
File path: superset/errors.py
##########
@@ -138,10 +139,12 @@ class SupersetErrorType(str, Enum):
     1034: _("The port number is invalid."),
     1035: _("Failed to start remote query on a worker."),
     1036: _("The database was deleted."),
+    1037: _("Custom SQL does not allow subquery."),

Review comment:
       ```suggestion
       1037: _("Custom SQL fields cannot contain subqueries."),
   ```
   
   (keeping it consistent with Aaron's suggestion)

##########
File path: superset/connectors/sqla/models.py
##########
@@ -1165,7 +1169,7 @@ def get_sqla_query(  # pylint: 
disable=too-many-arguments,too-many-locals,too-ma
                     # if groupby field equals a selected column
                     elif selected in columns_by_name:
                         outer = columns_by_name[selected].get_sqla_col()
-                    else:
+                    elif validate_adhoc_subquery(selected):

Review comment:
       Same here:
   
   ```suggestion
                       else:
                           validate_adhoc_subquery(selected)
   ```

##########
File path: superset/connectors/sqla/utils.py
##########
@@ -119,3 +120,29 @@ def get_virtual_table_metadata(dataset: "SqlaTable") -> 
List[Dict[str, str]]:
     except Exception as ex:
         raise SupersetGenericDBErrorException(message=str(ex)) from ex
     return cols
+
+
+def validate_adhoc_subquery(raw_sql: str) -> bool:
+    """
+    check adhoc SQL contains sub-queries or nested sub-queries with table
+    :param raw_sql: adhoc sql expression
+    :return True if sql contains no sub-queries or nested sub-queries with 
table
+    :raise SupersetSecurityException if sql contains sub-queries or
+    nested sub-queries with table
+    """
+    # pylint: disable=import-outside-toplevel
+    from superset import is_feature_enabled
+
+    if is_feature_enabled("ALLOW_ADHOC_SUBQUERY"):
+        return True
+
+    for statement in sqlparse.parse(raw_sql):
+        if has_table_query(statement):
+            raise SupersetSecurityException(
+                SupersetError(
+                    
error_type=SupersetErrorType.ADHOC_SUBQUERY_NOT_ALLOWED_ERROR,
+                    message=_("Custom SQL fields cannot contain sub-queries."),
+                    level=ErrorLevel.ERROR,
+                )
+            )
+    return True

Review comment:
       My suggestion here would be:
   
   ```suggestion
   def validate_adhoc_subquery(raw_sql: str) -> None:
       """
       Check if adhoc SQL contains sub-queries or nested sub-queries with table
       :param raw_sql: adhoc sql expression
       :raise SupersetSecurityException if sql contains sub-queries or
       nested sub-queries with table
       """
       # pylint: disable=import-outside-toplevel
       from superset import is_feature_enabled
   
       if is_feature_enabled("ALLOW_ADHOC_SUBQUERY"):
           return
   
       for statement in sqlparse.parse(raw_sql):
           if has_table_query(statement):
               raise SupersetSecurityException(
                   SupersetError(
                       
error_type=SupersetErrorType.ADHOC_SUBQUERY_NOT_ALLOWED_ERROR,
                       message=_("Custom SQL fields cannot contain 
sub-queries."),
                       level=ErrorLevel.ERROR,
                   )
               )
       return
   ```

##########
File path: superset/connectors/sqla/models.py
##########
@@ -1180,7 +1184,7 @@ def get_sqla_query(  # pylint: 
disable=too-many-arguments,too-many-locals,too-ma
             for selected in columns:
                 select_exprs.append(
                     columns_by_name[selected].get_sqla_col()
-                    if selected in columns_by_name
+                    if selected in columns_by_name and 
validate_adhoc_subquery(selected)

Review comment:
       This and line 1395 would also require some modification. Here something 
like:
   
   ```python
   for selected in columns:
       validate_adhoc_subquery(selected)
       selec_exprs.append(...)
   ```

##########
File path: superset/connectors/sqla/models.py
##########
@@ -906,9 +908,11 @@ def adhoc_column_to_sqla(
         """
         label = utils.get_column_name(col)
         expression = col["sqlExpression"]
+        sqla_metric = None
         if template_processor and expression:
             expression = template_processor.process_template(expression)
-        sqla_metric = literal_column(expression)
+        if expression and validate_adhoc_subquery(expression):
+            sqla_metric = literal_column(expression)

Review comment:
       Not sure if my explanation made sense, but my suggestion was to change 
this to:
   
   ```suggestion
           validate_adhoc_subquery(expression)
           sqla_metric = literal_column(expression)
   ```
   
   Because when I read the code `if expression and 
validate_adhoc_subquery(expression):` I wonder "what happens when 
`validate_adhoc_subquery(expression)` is false?", but that's a situation that 
can never happen, because the function currently can only return `True` (or 
raise an exception).
   
   Having the function return `None` instead makes it easier to understand 
what's going on where it's being used.
   
   (Alternatively, you could modify the function to return `True` or `False`, 
and raise the exception at the call site based on its return value. But because 
you're calling it from multiple places I think it's not worth it.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to