Maciek, thanks for the note via private message!
To all of you: Here comes the solution to just skip entries inside a column
that contain a combination of float and “>” :
pd.read_csv('test_mw_r2.csv', sep=';', converters={'r2': lambda x: np.NaN if x
and x[0] == '>' else x}).dropna(axis=0)
Paul
Von: Maciek Wójcikowski [mailto:mac...@wojcikowski.pl]
Gesendet: Freitag, 11. März 2016 12:29
An: Paul Czodrowski <paul.czodrow...@merckgroup.com>
Cc: rdkit <rdkit-discuss@lists.sourceforge.net>
Betreff: Re: [Rdkit-discuss] Pandas dataframe manipulation
Hi Paul,
I would suggest:
* assigning dtype of dataframe/column to str/np.object
* cleaning up the IC50s
* casting to float/int as dataframe.astype()
Or alternatively you could use "converters" argument:
pd.read_csv('filename.csv', converters={'ic50_colname': lambda x:
x.replace('>', '')})
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
----
Pozdrawiam, | Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>
2016-03-11 11:12 GMT+01:00 Paul Czodrowski
<paul.czodrow...@merckgroup.com<mailto:paul.czodrow...@merckgroup.com>>:
Dear RDKitter & Pandas-Dataframes heavy users,
please find below a question concerning the conversion of pandas dataframes:
df = pd.DataFrame({"item": ["a", "b", "c", "d", "e"], "row1": [1,2,3,">2",5],
"row2":[0.1,0.2,0.3,0.4,0.5],"row3":["ab","cd","ed","gh","ij"]})
df_new = df[df[["row1"]].applymap(np.isreal).all(1)]
I would like to get rid of this nasty ">2" entry in "row1" => This works
perfect given the snippet above.
However, when I read in a CSV file containing similar data (see the attached
CSV) => The conversion does not work: all columns in the IC50 value are
discarded and end up in yielding "NaN".
What is going wrong?
Thanks & Cheers,
Paul
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss