aru-trackunit opened a new issue, #39891:
URL: https://github.com/apache/airflow/issues/39891

   ### Apache Airflow Provider(s)
   
   databricks
   
   ### Versions of Apache Airflow Providers
   
   <h2>Providers</h2>
     
   
     
   Package Name | Version | Description
   -- | -- | --
   apache-airflow-providers-amazon | 8.21.0 | Amazon integration (including 
Amazon Web Services (AWS)).
   apache-airflow-providers-cncf-kubernetes | 7.13.0 | Kubernetes
   apache-airflow-providers-common-io | 1.3.0 | ``Common IO Provider``
   apache-airflow-providers-common-sql | 1.11.1 | Common SQL Provider
   apache-airflow-providers-databricks | 6.4.0 | Databricks
   apache-airflow-providers-ftp | 3.7.0 | File Transfer Protocol (FTP)
   apache-airflow-providers-github | 2.6.0 | GitHub
   apache-airflow-providers-google | 10.17.0 | Google services including:    - 
Google Ads   - Google Cloud (GCP)   - Google Firebase   - Google LevelDB   - 
Google Marketing Platform   - Google Workspace (formerly Google Suite)
   apache-airflow-providers-hashicorp | 3.6.4 | Hashicorp including Hashicorp 
Vault
   apache-airflow-providers-http | 4.10.0 | Hypertext Transfer Protocol (HTTP)
   apache-airflow-providers-imap | 3.5.0 | Internet Message Access Protocol 
(IMAP)
   apache-airflow-providers-mysql | 5.5.4 | MySQL
   apache-airflow-providers-postgres | 5.10.2 | PostgreSQL
   apache-airflow-providers-sftp | 4.9.1 | SSH File Transfer Protocol (SFTP)
   apache-airflow-providers-slack | 8.7.0 | Slack services integration 
including:    - Slack API   - Slack Incoming Webhook
   apache-airflow-providers-smtp | 1.6.1 | Simple Mail Transfer Protocol (SMTP)
   apache-airflow-providers-snowflake | 5.4.0 | Snowflake
   apache-airflow-providers-sqlite | 3.7.1 | SQLite
   apache-airflow-providers-ssh | 3.11.0 | Secure Shell (SSH)
   
   Providers
   Package Name         Version         Description
   
[apache-airflow-providers-amazon](https://airflow.apache.org/docs/apache-airflow-providers-amazon/8.21.0)
    8.21.0  Amazon integration (including [Amazon Web Services 
(AWS)](https://aws.amazon.com/)).
   
[apache-airflow-providers-cncf-kubernetes](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/7.13.0)
  7.13.0  [Kubernetes](https://kubernetes.io/)
   
[apache-airflow-providers-common-io](https://airflow.apache.org/docs/apache-airflow-providers-common-io/1.3.0)
       1.3.0   ``Common IO Provider``
   
[apache-airflow-providers-common-sql](https://airflow.apache.org/docs/apache-airflow-providers-common-sql/1.11.1)
    1.11.1  [Common SQL Provider](https://en.wikipedia.org/wiki/SQL)
   
[apache-airflow-providers-databricks](https://airflow.apache.org/docs/apache-airflow-providers-databricks/6.4.0)
     6.4.0   [Databricks](https://databricks.com/)
   
[apache-airflow-providers-ftp](https://airflow.apache.org/docs/apache-airflow-providers-ftp/3.7.0)
   3.7.0   [File Transfer Protocol (FTP)](https://tools.ietf.org/html/rfc114)
   
[apache-airflow-providers-github](https://airflow.apache.org/docs/apache-airflow-providers-github/2.6.0)
     2.6.0   [GitHub](https://www.github.com/)
   
[apache-airflow-providers-google](https://airflow.apache.org/docs/apache-airflow-providers-google/10.17.0)
   10.17.0         Google services including: - [Google 
Ads](https://ads.google.com/) - [Google Cloud (GCP)](https://cloud.google.com/) 
- [Google Firebase](https://firebase.google.com/) - [Google 
LevelDB](https://github.com/google/leveldb/) - [Google Marketing 
Platform](https://marketingplatform.google.com/) - [Google 
Workspace](https://workspace.google.com/) (formerly Google Suite)
   
[apache-airflow-providers-hashicorp](https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/3.6.4)
       3.6.4   Hashicorp including [Hashicorp 
Vault](https://www.vaultproject.io/)
   
[apache-airflow-providers-http](https://airflow.apache.org/docs/apache-airflow-providers-http/4.10.0)
        4.10.0  [Hypertext Transfer Protocol 
(HTTP)](https://www.w3.org/Protocols/)
   
[apache-airflow-providers-imap](https://airflow.apache.org/docs/apache-airflow-providers-imap/3.5.0)
         3.5.0   [Internet Message Access Protocol 
(IMAP)](https://tools.ietf.org/html/rfc3501)
   
[apache-airflow-providers-mysql](https://airflow.apache.org/docs/apache-airflow-providers-mysql/5.5.4)
       5.5.4   [MySQL](https://www.mysql.com/)
   
[apache-airflow-providers-postgres](https://airflow.apache.org/docs/apache-airflow-providers-postgres/5.10.2)
        5.10.2  [PostgreSQL](https://www.postgresql.org/)
   
[apache-airflow-providers-sftp](https://airflow.apache.org/docs/apache-airflow-providers-sftp/4.9.1)
         4.9.1   [SSH File Transfer Protocol 
(SFTP)](https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/)
   
[apache-airflow-providers-slack](https://airflow.apache.org/docs/apache-airflow-providers-slack/8.7.0)
       8.7.0   [Slack](https://slack.com/) services integration including: - 
[Slack API](https://api.slack.com/) - [Slack Incoming 
Webhook](https://api.slack.com/messaging/webhooks)
   
[apache-airflow-providers-smtp](https://airflow.apache.org/docs/apache-airflow-providers-smtp/1.6.1)
         1.6.1   [Simple Mail Transfer Protocol 
(SMTP)](https://tools.ietf.org/html/rfc5321)
   
[apache-airflow-providers-snowflake](https://airflow.apache.org/docs/apache-airflow-providers-snowflake/5.4.0)
       5.4.0   [Snowflake](https://www.snowflake.com/)
   
[apache-airflow-providers-sqlite](https://airflow.apache.org/docs/apache-airflow-providers-sqlite/3.7.1)
     3.7.1   [SQLite](https://www.sqlite.org/)
   
[apache-airflow-providers-ssh](https://airflow.apache.org/docs/apache-airflow-providers-ssh/3.11.0)
  3.11.0  [Secure Shell (SSH)](https://tools.ietf.org/html/rfc4251)
   
   ### Apache Airflow version
   
   2.8.4
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   After DatabrickSQLOperator finishes execution it returns hard to parse 
object in XCOM.
   
   ```
   (
       [
           '(catalog_name,string,None,None,None,None,None)',
           '(schema_name,string,None,None,None,None,None)'
       ],
       [
           '(prod,schema_1)', 
           '(prod,schema_2)'
       ]
   )
   ```
   
   Why is it hard?
   When I try to load it into pandas then instead of having two columns then 
entire row is loaded into one column
   ```
   import pandas as pd
   
   test_data = 
(['(catalog_name,string,None,None,None,None,None)','(schema_name,string,None,None,None,None,None)'],['(prod,schema_1)',
 '(prod,schema_2)'])
   
   test_df = pd.DataFrame(data=test_data[1])
   ```
   
   Output test_df:
   |                |0 |
   |-|-|
   |0|(prod,schema_1)|
   |1|(prod,schema_2)|
   
   Same issue applies to columns
   
   
   
   
   
   ### What you think should happen instead
   
   First question is whether is it a bug or a feature?
   
   IMO quotes wrapping row should be deleted and added to appropriate string 
values
   
   ```
   (
       [
           ('catalog_name','string',None,None,None,None,None),
           ('schema_name','string',None,None,None,None,None)
       ],
       [
           ('prod','schema_1'), 
           ('prod','schema_2')
       ]
   )
   ```
   Executing the same code as before we get properly configured DataFrame.
   ```
   import pandas as pd
   
   test_data = 
(['(catalog_name,string,None,None,None,None,None)','(schema_name,string,None,None,None,None,None)'],['(prod,schema_1)',
 '(prod,schema_2)'])
   
   test_df = pd.DataFrame(data=test_data[1])
   ```
   
   Output test_df:
   |                |0 | 1 |
   |-|-|-|
   |0|prod|schema_1|
   |1|prod|schema_2|
   
   
   I haven't spotted exact place in the code where the error occurs, does the 
error happen only on `DatabrickSQLOperator` or is it a wider `SQLOperator` 
behaviour?
   
   ### How to reproduce
   
   ```
   validate_prod_schema_privileges = DatabricksSqlOperator(
           task_id="validate_prod_schema_privileges",
           dag_default_args={},
           databricks_conn_id="conn-id",
           sql_endpoint_name="endpoint_name",
           sql="SELECT DISTINCT table_catalog as catalog_name, table_schema as 
schema_name FROM prod.information_schema.tables"
       )
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to