potiuk commented on code in PR #27854:
URL: https://github.com/apache/airflow/pull/27854#discussion_r1030510789
##########
airflow/providers/databricks/operators/databricks_sql.py:
##########
@@ -114,16 +112,17 @@ def __init__(
**client_parameters,
**hook_params,
}
+ self.schema = schema
def get_db_hook(self) -> DatabricksSqlHook:
return DatabricksSqlHook(self.databricks_conn_id, **self.hook_params)
- def _process_output(self, schema, results):
Review Comment:
Yeah. I saw exactly this problem with schema. I think it is not a problem
(other than mixture of terms - the problem is that curently it would not work -
we cannot use _process_output as is becuase it expects schema as first
parameter and it would get something very different. And I am quite ok with
setting the schema and other fields (i have a change in progress but was a bit
distracted).
But I think we also have to fix _process_output.
The way how _process_output would work after the #25717 was rather
cumbersome as I understand it:
* If the result was scalar - it would just run _process_output with FIRST
column value from the row as schema, and passing the rest as extra params - if
the row was exactly 2 values it would work, but otherwise it would fail with
TypeError: _process_output() takes 2 positional arguments but N were given
* if the result was not scalar - it would repeat _process_output call as
many rows were returned
I am working on a fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]