Maciek, thanks for the note via private message!
To all of you: Here comes the solution to just skip entries inside a column 
that contain a combination of float and “>” :

pd.read_csv('test_mw_r2.csv', sep=';', converters={'r2': lambda x: np.NaN if x 
and x[0] == '>' else x}).dropna(axis=0)


Paul

Von: Maciek Wójcikowski [mailto:mac...@wojcikowski.pl]
Gesendet: Freitag, 11. März 2016 12:29
An: Paul Czodrowski <paul.czodrow...@merckgroup.com>
Cc: rdkit <rdkit-discuss@lists.sourceforge.net>
Betreff: Re: [Rdkit-discuss] Pandas dataframe manipulation

Hi Paul,

I would suggest:

  *   assigning dtype of dataframe/column to str/np.object
  *   cleaning up the IC50s
  *   casting to float/int as dataframe.astype()
Or alternatively you could use "converters" argument:
pd.read_csv('filename.csv', converters={'ic50_colname': lambda x: 
x.replace('>', '')})

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2016-03-11 11:12 GMT+01:00 Paul Czodrowski 
<paul.czodrow...@merckgroup.com<mailto:paul.czodrow...@merckgroup.com>>:
Dear RDKitter & Pandas-Dataframes heavy users,

please find below a question concerning the conversion of pandas dataframes:
df = pd.DataFrame({"item": ["a", "b", "c", "d", "e"], "row1": [1,2,3,">2",5], 
"row2":[0.1,0.2,0.3,0.4,0.5],"row3":["ab","cd","ed","gh","ij"]})
df_new = df[df[["row1"]].applymap(np.isreal).all(1)]

I would like to get rid of this nasty ">2" entry in "row1" => This works 
perfect  given the snippet above.

However, when I read in a CSV file containing similar data (see the attached 
CSV) => The conversion does not work: all columns in the IC50 value are 
discarded and end up in yielding "NaN".

What is going wrong?


Thanks & Cheers,
Paul



This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to