[jira] [Updated] (AIRFLOW-6481) SalesforceHook attempts to use .str accessor on object dtype

Teddy Hartanto (Jira) Sun, 05 Jan 2020 19:56:25 -0800


     [ 
https://issues.apache.org/jira/browse/AIRFLOW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Teddy Hartanto updated AIRFLOW-6481:
------------------------------------
    Description: 
I've searched through Airflow's issues and couldn't find any report regarding 
this. I wonder if I'm the only one who's facing this? 

 

*Bug description*

I'm using the SalesforceHook to fetch data from SalesForce and I encountered 
this exception:
{code:java}
AttributeError: ('Can only use .str accessor with string values, which use 
np.object_ dtype in pandas', ...)
{code}
This exception occurs here:
{code:java}
if fmt == "csv":
    # there are also a ton of newline objects
    # that mess up our ability to write to csv
    # we remove these newlines so that the output is a valid CSV format
    self.log.info("Cleaning data and writing to CSV")
    possible_strings = df.columns[df.dtypes == "object"]
    df[possible_strings] = df[possible_strings].apply(
        lambda x: x.str.replace("\r\n", "")
    )
    df[possible_strings] = df[possible_strings].apply(
        lambda x: x.str.replace("\n", "")
    )

    # write the dataframe
    df.to_csv(filename, index=False)
{code}
To fix it, we should just cast the object type to string as such: 
{code:java}
if fmt == "csv":
    ...
    df[possible_strings] = df[possible_strings].astype(str).apply(
        lambda x: x.str.replace("\r\n", "")
    )
    df[possible_strings] = df[possible_strings].astype(str).apply(
        lambda x: x.str.replace("\n", "")
    )
{code}
I've tested this and it works for me. Could somebody help me verify that the 
type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this 
with the unit test included.

  was:
I've searched through Airflow's issues and couldn't find any report regarding 
this. I wonder if I'm the only one who's facing this?

 

*Bug description*

I'm using the SalesforceHook to fetch data from SalesForce and I encountered 
this exception:
{code:java}
AttributeError: ('Can only use .str accessor with string values, which use 
np.object_ dtype in pandas', ...)
{code}
 

This exception occurs here:

 
{code:java}
if fmt == "csv":
    # there are also a ton of newline objects
    # that mess up our ability to write to csv
    # we remove these newlines so that the output is a valid CSV format
    self.log.info("Cleaning data and writing to CSV")
    possible_strings = df.columns[df.dtypes == "object"]
    df[possible_strings] = df[possible_strings].apply(
        lambda x: x.str.replace("\r\n", "")
    )
    df[possible_strings] = df[possible_strings].apply(
        lambda x: x.str.replace("\n", "")
    )

    # write the dataframe
    df.to_csv(filename, index=False)
{code}
**To fix it, we should just cast the object type to string as such:

 
{code:java}
if fmt == "csv":
    ...
    df[possible_strings] = df[possible_strings].astype(str).apply(
        lambda x: x.str.replace("\r\n", "")
    )
    df[possible_strings] = df[possible_strings].astype(str).apply(
        lambda x: x.str.replace("\n", "")
    )
{code}
I've tested this and it works for me. Could somebody help me verify that the 
type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this 
with the unit test included.


> SalesforceHook attempts to use .str accessor on object dtype
> ------------------------------------------------------------
>
>                 Key: AIRFLOW-6481
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6481
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.10.7
>            Reporter: Teddy Hartanto
>            Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've searched through Airflow's issues and couldn't find any report regarding 
> this. I wonder if I'm the only one who's facing this? 
>  
> *Bug description*
> I'm using the SalesforceHook to fetch data from SalesForce and I encountered 
> this exception:
> {code:java}
> AttributeError: ('Can only use .str accessor with string values, which use 
> np.object_ dtype in pandas', ...)
> {code}
> This exception occurs here:
> {code:java}
> if fmt == "csv":
>     # there are also a ton of newline objects
>     # that mess up our ability to write to csv
>     # we remove these newlines so that the output is a valid CSV format
>     self.log.info("Cleaning data and writing to CSV")
>     possible_strings = df.columns[df.dtypes == "object"]
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\n", "")
>     )
>     # write the dataframe
>     df.to_csv(filename, index=False)
> {code}
> To fix it, we should just cast the object type to string as such: 
> {code:java}
> if fmt == "csv":
>     ...
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\n", "")
>     )
> {code}
> I've tested this and it works for me. Could somebody help me verify that the 
> type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this 
> with the unit test included.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (AIRFLOW-6481) SalesforceHook attempts to use .str accessor on object dtype

Reply via email to