[
https://issues.apache.org/jira/browse/AIRFLOW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098362#comment-17098362
]
ASF GitHub Bot commented on AIRFLOW-6481:
-----------------------------------------
potiuk edited a comment on pull request #7703:
URL: https://github.com/apache/airflow/pull/7703#issuecomment-623092386
We are trying to minimise cherry-picking now. Next week hopefully we are
going to release backport operators - so yo will be able to install
`apache-airflow-backported-salesforce` package and use the operators from the
new providers package in the previous 1.10.* airflow (in parallel to the
operators from the 1.10. Would that be a better solution @jeffolsi ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> SalesforceHook attempts to use .str accessor on object dtype
> ------------------------------------------------------------
>
> Key: AIRFLOW-6481
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6481
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Affects Versions: 1.10.7
> Reporter: Teddy Hartanto
> Assignee: Teddy Hartanto
> Priority: Minor
> Fix For: 2.0.0
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> I've searched through Airflow's issues and couldn't find any report regarding
> this. I wonder if I'm the only one who's facing this?
> {noformat}
> Panda version: 0.24.2{noformat}
> *Bug description*
> I'm using the SalesforceHook to fetch data from SalesForce and I encountered
> this exception:
> {code:java}
> AttributeError: ('Can only use .str accessor with string values, which use
> np.object_ dtype in pandas', ...)
> {code}
> The root of the problem is that some of the object in Salesforce has a column
> with compound data type. Eg: User's address is a Python dict:
> {code:java}
> <class 'dict'>: {'city': None, 'country': 'my', 'geocodeAccuracy': None,
> 'latitude': None, 'longitude': None, 'postalCode': None, 'state': None,
> 'street': None}{code}
> The problematic code is here:
> {code:java}
> if fmt == "csv":
> # there are also a ton of newline objects
> # that mess up our ability to write to csv
> # we remove these newlines so that the output is a valid CSV format
> self.log.info("Cleaning data and writing to CSV")
> possible_strings = df.columns[df.dtypes == "object"]
> df[possible_strings] = df[possible_strings].apply(
> lambda x: x.str.replace("\r\n", "")
> )
> df[possible_strings] = df[possible_strings].apply(
> lambda x: x.str.replace("\n", "")
> )
> # write the dataframe
> df.to_csv(filename, index=False)
> {code}
> Because a Series containing Python dicts are also considered of dtype object,
> they're assumed to be "possible_strings". And then, when .str is called on
> that Series, the exception is thrown.
> To fix it, we could explicitly cast the object type to string as such:
> {code:java}
> if fmt == "csv":
> ...
> df[possible_strings] = df[possible_strings].astype(str).apply(
> lambda x: x.str.replace("\r\n", "")
> )
> df[possible_strings] = df[possible_strings].astype(str).apply(
> lambda x: x.str.replace("\n", "")
> )
> {code}
> I've tested this and it works for me. Could somebody help me verify that the
> type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this
> with the unit test included.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)