aru-trackunit opened a new issue, #39891:
URL: https://github.com/apache/airflow/issues/39891
### Apache Airflow Provider(s)
databricks
### Versions of Apache Airflow Providers
<h2>Providers</h2>
Package Name | Version | Description
-- | -- | --
apache-airflow-providers-amazon | 8.21.0 | Amazon integration (including
Amazon Web Services (AWS)).
apache-airflow-providers-cncf-kubernetes | 7.13.0 | Kubernetes
apache-airflow-providers-common-io | 1.3.0 | ``Common IO Provider``
apache-airflow-providers-common-sql | 1.11.1 | Common SQL Provider
apache-airflow-providers-databricks | 6.4.0 | Databricks
apache-airflow-providers-ftp | 3.7.0 | File Transfer Protocol (FTP)
apache-airflow-providers-github | 2.6.0 | GitHub
apache-airflow-providers-google | 10.17.0 | Google services including: -
Google Ads - Google Cloud (GCP) - Google Firebase - Google LevelDB -
Google Marketing Platform - Google Workspace (formerly Google Suite)
apache-airflow-providers-hashicorp | 3.6.4 | Hashicorp including Hashicorp
Vault
apache-airflow-providers-http | 4.10.0 | Hypertext Transfer Protocol (HTTP)
apache-airflow-providers-imap | 3.5.0 | Internet Message Access Protocol
(IMAP)
apache-airflow-providers-mysql | 5.5.4 | MySQL
apache-airflow-providers-postgres | 5.10.2 | PostgreSQL
apache-airflow-providers-sftp | 4.9.1 | SSH File Transfer Protocol (SFTP)
apache-airflow-providers-slack | 8.7.0 | Slack services integration
including: - Slack API - Slack Incoming Webhook
apache-airflow-providers-smtp | 1.6.1 | Simple Mail Transfer Protocol (SMTP)
apache-airflow-providers-snowflake | 5.4.0 | Snowflake
apache-airflow-providers-sqlite | 3.7.1 | SQLite
apache-airflow-providers-ssh | 3.11.0 | Secure Shell (SSH)
Providers
Package Name Version Description
[apache-airflow-providers-amazon](https://airflow.apache.org/docs/apache-airflow-providers-amazon/8.21.0)
8.21.0 Amazon integration (including [Amazon Web Services
(AWS)](https://aws.amazon.com/)).
[apache-airflow-providers-cncf-kubernetes](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/7.13.0)
7.13.0 [Kubernetes](https://kubernetes.io/)
[apache-airflow-providers-common-io](https://airflow.apache.org/docs/apache-airflow-providers-common-io/1.3.0)
1.3.0 ``Common IO Provider``
[apache-airflow-providers-common-sql](https://airflow.apache.org/docs/apache-airflow-providers-common-sql/1.11.1)
1.11.1 [Common SQL Provider](https://en.wikipedia.org/wiki/SQL)
[apache-airflow-providers-databricks](https://airflow.apache.org/docs/apache-airflow-providers-databricks/6.4.0)
6.4.0 [Databricks](https://databricks.com/)
[apache-airflow-providers-ftp](https://airflow.apache.org/docs/apache-airflow-providers-ftp/3.7.0)
3.7.0 [File Transfer Protocol (FTP)](https://tools.ietf.org/html/rfc114)
[apache-airflow-providers-github](https://airflow.apache.org/docs/apache-airflow-providers-github/2.6.0)
2.6.0 [GitHub](https://www.github.com/)
[apache-airflow-providers-google](https://airflow.apache.org/docs/apache-airflow-providers-google/10.17.0)
10.17.0 Google services including: - [Google
Ads](https://ads.google.com/) - [Google Cloud (GCP)](https://cloud.google.com/)
- [Google Firebase](https://firebase.google.com/) - [Google
LevelDB](https://github.com/google/leveldb/) - [Google Marketing
Platform](https://marketingplatform.google.com/) - [Google
Workspace](https://workspace.google.com/) (formerly Google Suite)
[apache-airflow-providers-hashicorp](https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/3.6.4)
3.6.4 Hashicorp including [Hashicorp
Vault](https://www.vaultproject.io/)
[apache-airflow-providers-http](https://airflow.apache.org/docs/apache-airflow-providers-http/4.10.0)
4.10.0 [Hypertext Transfer Protocol
(HTTP)](https://www.w3.org/Protocols/)
[apache-airflow-providers-imap](https://airflow.apache.org/docs/apache-airflow-providers-imap/3.5.0)
3.5.0 [Internet Message Access Protocol
(IMAP)](https://tools.ietf.org/html/rfc3501)
[apache-airflow-providers-mysql](https://airflow.apache.org/docs/apache-airflow-providers-mysql/5.5.4)
5.5.4 [MySQL](https://www.mysql.com/)
[apache-airflow-providers-postgres](https://airflow.apache.org/docs/apache-airflow-providers-postgres/5.10.2)
5.10.2 [PostgreSQL](https://www.postgresql.org/)
[apache-airflow-providers-sftp](https://airflow.apache.org/docs/apache-airflow-providers-sftp/4.9.1)
4.9.1 [SSH File Transfer Protocol
(SFTP)](https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/)
[apache-airflow-providers-slack](https://airflow.apache.org/docs/apache-airflow-providers-slack/8.7.0)
8.7.0 [Slack](https://slack.com/) services integration including: -
[Slack API](https://api.slack.com/) - [Slack Incoming
Webhook](https://api.slack.com/messaging/webhooks)
[apache-airflow-providers-smtp](https://airflow.apache.org/docs/apache-airflow-providers-smtp/1.6.1)
1.6.1 [Simple Mail Transfer Protocol
(SMTP)](https://tools.ietf.org/html/rfc5321)
[apache-airflow-providers-snowflake](https://airflow.apache.org/docs/apache-airflow-providers-snowflake/5.4.0)
5.4.0 [Snowflake](https://www.snowflake.com/)
[apache-airflow-providers-sqlite](https://airflow.apache.org/docs/apache-airflow-providers-sqlite/3.7.1)
3.7.1 [SQLite](https://www.sqlite.org/)
[apache-airflow-providers-ssh](https://airflow.apache.org/docs/apache-airflow-providers-ssh/3.11.0)
3.11.0 [Secure Shell (SSH)](https://tools.ietf.org/html/rfc4251)
### Apache Airflow version
2.8.4
### Operating System
Debian GNU/Linux 11 (bullseye)
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### What happened
After DatabrickSQLOperator finishes execution it returns hard to parse
object in XCOM.
```
(
[
'(catalog_name,string,None,None,None,None,None)',
'(schema_name,string,None,None,None,None,None)'
],
[
'(prod,schema_1)',
'(prod,schema_2)'
]
)
```
Why is it hard?
When I try to load it into pandas then instead of having two columns then
entire row is loaded into one column
```
import pandas as pd
test_data =
(['(catalog_name,string,None,None,None,None,None)','(schema_name,string,None,None,None,None,None)'],['(prod,schema_1)',
'(prod,schema_2)'])
test_df = pd.DataFrame(data=test_data[1])
```
Output test_df:
| |0 |
|-|-|
|0|(prod,schema_1)|
|1|(prod,schema_2)|
Same issue applies to columns
### What you think should happen instead
First question is whether is it a bug or a feature?
IMO quotes wrapping row should be deleted and added to appropriate string
values
```
(
[
('catalog_name','string',None,None,None,None,None),
('schema_name','string',None,None,None,None,None)
],
[
('prod','schema_1'),
('prod','schema_2')
]
)
```
Executing the same code as before we get properly configured DataFrame.
```
import pandas as pd
test_data =
(['(catalog_name,string,None,None,None,None,None)','(schema_name,string,None,None,None,None,None)'],['(prod,schema_1)',
'(prod,schema_2)'])
test_df = pd.DataFrame(data=test_data[1])
```
Output test_df:
| |0 | 1 |
|-|-|-|
|0|prod|schema_1|
|1|prod|schema_2|
I haven't spotted exact place in the code where the error occurs, does the
error happen only on `DatabrickSQLOperator` or is it a wider `SQLOperator`
behaviour?
### How to reproduce
```
validate_prod_schema_privileges = DatabricksSqlOperator(
task_id="validate_prod_schema_privileges",
dag_default_args={},
databricks_conn_id="conn-id",
sql_endpoint_name="endpoint_name",
sql="SELECT DISTINCT table_catalog as catalog_name, table_schema as
schema_name FROM prod.information_schema.tables"
)
```
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]