[
https://issues.apache.org/jira/browse/AIRFLOW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Teddy Hartanto updated AIRFLOW-6481:
------------------------------------
Description:
I've searched through Airflow's issues and couldn't find any report regarding
this. I wonder if I'm the only one who's facing this?
*Bug description*
I'm using the SalesforceHook to fetch data from SalesForce and I encountered
this exception:
{code:java}
AttributeError: ('Can only use .str accessor with string values, which use
np.object_ dtype in pandas', ...)
{code}
This exception occurs here:
{code:java}
if fmt == "csv":
# there are also a ton of newline objects
# that mess up our ability to write to csv
# we remove these newlines so that the output is a valid CSV format
self.log.info("Cleaning data and writing to CSV")
possible_strings = df.columns[df.dtypes == "object"]
df[possible_strings] = df[possible_strings].apply(
lambda x: x.str.replace("\r\n", "")
)
df[possible_strings] = df[possible_strings].apply(
lambda x: x.str.replace("\n", "")
)
# write the dataframe
df.to_csv(filename, index=False)
{code}
To fix it, we should just cast the object type to string as such:
{code:java}
if fmt == "csv":
...
df[possible_strings] = df[possible_strings].astype(str).apply(
lambda x: x.str.replace("\r\n", "")
)
df[possible_strings] = df[possible_strings].astype(str).apply(
lambda x: x.str.replace("\n", "")
)
{code}
I've tested this and it works for me. Could somebody help me verify that the
type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this
with the unit test included.
was:
I've searched through Airflow's issues and couldn't find any report regarding
this. I wonder if I'm the only one who's facing this?
*Bug description*
I'm using the SalesforceHook to fetch data from SalesForce and I encountered
this exception:
{code:java}
AttributeError: ('Can only use .str accessor with string values, which use
np.object_ dtype in pandas', ...)
{code}
This exception occurs here:
{code:java}
if fmt == "csv":
# there are also a ton of newline objects
# that mess up our ability to write to csv
# we remove these newlines so that the output is a valid CSV format
self.log.info("Cleaning data and writing to CSV")
possible_strings = df.columns[df.dtypes == "object"]
df[possible_strings] = df[possible_strings].apply(
lambda x: x.str.replace("\r\n", "")
)
df[possible_strings] = df[possible_strings].apply(
lambda x: x.str.replace("\n", "")
)
# write the dataframe
df.to_csv(filename, index=False)
{code}
**To fix it, we should just cast the object type to string as such:
{code:java}
if fmt == "csv":
...
df[possible_strings] = df[possible_strings].astype(str).apply(
lambda x: x.str.replace("\r\n", "")
)
df[possible_strings] = df[possible_strings].astype(str).apply(
lambda x: x.str.replace("\n", "")
)
{code}
I've tested this and it works for me. Could somebody help me verify that the
type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this
with the unit test included.
> SalesforceHook attempts to use .str accessor on object dtype
> ------------------------------------------------------------
>
> Key: AIRFLOW-6481
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6481
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Affects Versions: 1.10.7
> Reporter: Teddy Hartanto
> Priority: Major
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> I've searched through Airflow's issues and couldn't find any report regarding
> this. I wonder if I'm the only one who's facing this?
>
> *Bug description*
> I'm using the SalesforceHook to fetch data from SalesForce and I encountered
> this exception:
> {code:java}
> AttributeError: ('Can only use .str accessor with string values, which use
> np.object_ dtype in pandas', ...)
> {code}
> This exception occurs here:
> {code:java}
> if fmt == "csv":
> # there are also a ton of newline objects
> # that mess up our ability to write to csv
> # we remove these newlines so that the output is a valid CSV format
> self.log.info("Cleaning data and writing to CSV")
> possible_strings = df.columns[df.dtypes == "object"]
> df[possible_strings] = df[possible_strings].apply(
> lambda x: x.str.replace("\r\n", "")
> )
> df[possible_strings] = df[possible_strings].apply(
> lambda x: x.str.replace("\n", "")
> )
> # write the dataframe
> df.to_csv(filename, index=False)
> {code}
> To fix it, we should just cast the object type to string as such:
> {code:java}
> if fmt == "csv":
> ...
> df[possible_strings] = df[possible_strings].astype(str).apply(
> lambda x: x.str.replace("\r\n", "")
> )
> df[possible_strings] = df[possible_strings].astype(str).apply(
> lambda x: x.str.replace("\n", "")
> )
> {code}
> I've tested this and it works for me. Could somebody help me verify that the
> type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this
> with the unit test included.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)