[ 
https://issues.apache.org/jira/browse/AIRFLOW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098361#comment-17098361
 ] 

ASF GitHub Bot commented on AIRFLOW-6481:
-----------------------------------------

potiuk commented on pull request #7703:
URL: https://github.com/apache/airflow/pull/7703#issuecomment-623092386


   We are trying to minimise cherry-picking now. Next week hopefully we are 
going to release backport operators - so yo will be able to install 
airflow-backported-salesforce operators and use the operators from the new 
providers package in the previous 1.10.* airflow (in parallel to the operators 
from the 1.10. Would that be a better solution @jeffolsi  ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> SalesforceHook attempts to use .str accessor on object dtype
> ------------------------------------------------------------
>
>                 Key: AIRFLOW-6481
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6481
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.10.7
>            Reporter: Teddy Hartanto
>            Assignee: Teddy Hartanto
>            Priority: Minor
>             Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've searched through Airflow's issues and couldn't find any report regarding 
> this. I wonder if I'm the only one who's facing this? 
> {noformat}
> Panda version: 0.24.2{noformat}
> *Bug description*
> I'm using the SalesforceHook to fetch data from SalesForce and I encountered 
> this exception:
> {code:java}
> AttributeError: ('Can only use .str accessor with string values, which use 
> np.object_ dtype in pandas', ...)
> {code}
> The root of the problem is that some of the object in Salesforce has a column 
> with compound data type. Eg: User's address is a Python dict:
> {code:java}
> <class 'dict'>: {'city': None, 'country': 'my', 'geocodeAccuracy': None, 
> 'latitude': None, 'longitude': None, 'postalCode': None, 'state': None, 
> 'street': None}{code}
> The problematic code is here:
> {code:java}
> if fmt == "csv":
>     # there are also a ton of newline objects
>     # that mess up our ability to write to csv
>     # we remove these newlines so that the output is a valid CSV format
>     self.log.info("Cleaning data and writing to CSV")
>     possible_strings = df.columns[df.dtypes == "object"]
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\n", "")
>     )
>     # write the dataframe
>     df.to_csv(filename, index=False)
> {code}
> Because a Series containing Python dicts are also considered of dtype object, 
> they're assumed to be "possible_strings". And then, when .str is called on 
> that Series, the exception is thrown.
> To fix it, we could explicitly cast the object type to string as such: 
> {code:java}
> if fmt == "csv":
>     ...
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\n", "")
>     )
> {code}
> I've tested this and it works for me. Could somebody help me verify that the 
> type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this 
> with the unit test included.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to