Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-07-05 Thread DmitriR
... sounds like a good explanation. If this is indeed the case, it would be 
reasonable to force flush to get guaranteed non-random behavior. 
Thanks!

Best,
Dmitri


> On Jul 5, 2016, at 7:39 AM, Brian Kelley  wrote:
> 
> After some digging, it looks like the underlying C++ streams aren't flushing. 
>  This means that python might not actually have all the information when you 
> print them out.  We may have to enable a "flush" function for these streams 
> for better error reporting on the python side, I'll need to investigate this 
> a bit more.
> 
> Cheers,
>  Brian
> 
> On Wed, Jun 29, 2016 at 3:14 PM, DmitriR  wrote:
> Brian - Sure. Attached:
> 
> RDKit-test-warnings-01.ipynb
> dff.pkl
> screenshot1-warningsPrintToNotebook.pdf
> screenshot2-noWarningsPrint.pdf
> 
> This has gotten stranger though. Now sometimes I get no visible output 
> (screenshot 2).
> 
> The total length of captured warnings still differs run to run, but now I 
> noticed that it alternates *imprecisely*; see comment in screenshot2, cell 
> 32). When I compared the sets of warnings produced on alternating runs (where 
> the difference is substantial: 43k characters vs 38k characters), they are 
> different because a large number of warnings do not get produced. I don't 
> know what the smaller variations are due to.
> 
> Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0, Firefox
> 
> Thanks.
> Dmitri
> 
> 
> > On Jun 29, 2016, at 9:07 AM, Brian Kelley  wrote:
> >
> > Dmitri,
> >   Could you send me the notebook that displays these issues?  I can't 
> > reproduce them.
> >
> > Thanks,
> >  Brian
> >
> 
> 
> 


--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-07-05 Thread Brian Kelley
After some digging, it looks like the underlying C++ streams aren't
flushing.  This means that python might not actually have all the
information when you print them out.  We may have to enable a "flush"
function for these streams for better error reporting on the python side,
I'll need to investigate this a bit more.

Cheers,
 Brian

On Wed, Jun 29, 2016 at 3:14 PM, DmitriR  wrote:

> Brian - Sure. Attached:
>
> RDKit-test-warnings-01.ipynb
> dff.pkl
> screenshot1-warningsPrintToNotebook.pdf
> screenshot2-noWarningsPrint.pdf
>
> This has gotten stranger though. Now sometimes I get no visible output
> (screenshot 2).
>
> The total length of captured warnings still differs run to run, but now I
> noticed that it alternates *imprecisely*; see comment in screenshot2, cell
> 32). When I compared the sets of warnings produced on alternating runs
> (where the difference is substantial: 43k characters vs 38k characters),
> they are different because a large number of warnings do not get produced.
> I don't know what the smaller variations are due to.
>
> Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0,
> Firefox
>
> Thanks.
> Dmitri
>
>
> > On Jun 29, 2016, at 9:07 AM, Brian Kelley  wrote:
> >
> > Dmitri,
> >   Could you send me the notebook that displays these issues?  I can't
> reproduce them.
> >
> > Thanks,
> >  Brian
> >
>
>
>
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-29 Thread Brian Kelley
Dmitri,
  Could you send me the notebook that displays these issues?  I can't
reproduce them.

Thanks,
 Brian



On Tue, Jun 28, 2016 at 6:25 PM, Brian Kelley  wrote:

> It looks like there may be an issue calling WrapLogs twice.  If you see
> the error messages in the notebook, it's already been called.  Importing
> IPythonConsole does this automatically.
>
> This may be the cause of our confusion.  I'll look into it.
>
> 
> Brian Kelley
>
> On Jun 28, 2016, at 3:54 PM, DmitriR  wrote:
>
> Hi Brian,
>
> First off, now I can capture the warnings, so for practical purposes my
> question has been addressed, thank you for helping me get to this point.
>
> Cool trick with StringIO. I can even just do:
>
> Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0,
> Firefox
>
> ```
> import io
> err = sys.stderr
> sys.stderr = io.StringIO()
>
> # capture errors/warnings
> Chem.MolFromSmiles('C1CC')
>
> msgs = sys.stderr.getvalue()
> sys.stderr = err
> print('Captured', msgs)
>
> # now errors show in the notebook again
> Chem.MolFromSmiles('C1CC')
> ```
>
> ==
>
> However, if you feel like digging a bit deeper, I'm a little confused too
> now :)
>
> What is the scope of WrapLogs() effects? (notebook-wide, or cell?) Or, by
> chance, does it set anything really persistent?
>
> In my prior notebook session, prior to trying WrapLogs() I could already
> see the warnings printed on red background (like in your screenshot, except
> that you have an ERROR msg, not WARNINGs as in my example).
>
> A call to WrapLogs() made warnings apparently disappear from the notebook.
>
> Upon reinitializing the session I could see the warnings on red background
> as before, wrote the code snippet in my prior email, and *without calling
> WrapLogs()* I could capture the warnings with it.
>
> So I assumed that RDKit messages went to the notebook's stderr by default,
> and WrapLogs() did something else.
>
> After getting your last email, I made a minimal test case (new notebook
> with just the RDKit call that generates warnings `dff['InChI'] =
> dff['ROMol'].map(Chem.MolToInchi)`, wrapped inside the stderr capture code
> snippet), killed all python instances, restarted the browser, loaded data
> from pickled dataframe.
>
> Now, *without ever having called WrapLogs()* I still get all RDKit
> warnings go to stderr, and I can still  capture them using the snippet.
> Calls to WrapLogs() now appear to have no effect whatsoever.
>
> If this indicates to you any potential issue, we can look more into it.
> Otherwise I'm good.
>
> ==
>
> The other strange behavior that I described below (the number of warnings
> alternating between successive calls to the same code using
> Chem.MolToInchi) remains though. Maybe it's the underlying InChi code, I
> did not investigate.
>
> Thanks again.
> Dmitri
>
>
>
> On Jun 28, 2016, at 2:14 PM, Brian Kelley  wrote:
>
> Dmitri,
>   I admit to being a bit confused.  What WrapLogs() does is simply
> redirect the C++ errors into python's stderr. See attache png.   I think
> you may have noticed that, as you are capturing with sys.stderr.
>
> These errors are output (at least for me) in the IPython notebook.  I'm
> not sure what is being hidden here.  Perhaps the notebook has changed
> somehow?  Here is my version:
>
> Python 2.7.11 |Anaconda 2.1.0 (x86_64)| (default, Dec  6 2015, 18:57:58)
> Type "copyright", "credits" or "license" for more information.
>
> IPython 4.0.0 -- An enhanced Interactive Python.
>
>
> btw - you can use StringIO as opposed to a file
>
> from StringIO import StringIO
>
> err = sys.stderr
> io = sys.stderr = StringIO()
> 
> sys.stderr = err
> print io.getvalue()
>
>
>
> On Tue, Jun 28, 2016 at 1:24 PM, DmitriR  wrote:
> Brian - Thank you!
>
> (on OSX 10.11.5, jupyter 4.1.0)
>
> rdkit.Chem.WrapLogs() does hide the messages.
> I could not figure out how to access them though once they are hidden.
>
> To capture warnings, this mechanism seems to work - but it is ugly.
>
> ```
> import os
> ## switch the streams
> stderr_fn = 'stderr.log'
> orig_stderr = sys.stderr
> sys.stderr = open(stderr_fn, 'w')
>
> ## RDKit code producing warnings goes here
>
> ## switch back stderr, process the warnings
> sys.stderr.flush()
> sys.stderr = orig_stderr
> with open(stderr_fn, 'r') as f: err_data = f.read()
> os.remove(stderr_fn)
> print(len(err_data))
> ```
>
> Assuming it is all even necessary, this could be made much nicer by using
> a context manager/decorator to handle stderr capture and return the
> warnings text in an extra argument, along the lines of
>
> http://stackoverflow.com/questions/5136611/capture-stdout-from-a-script-in-python
>
> ==
>
> But also I noticed something weird:
>
> If I re-run the notebook cell with code that produces warnings, I get *no
> warnings* every third or sometimes second invocation.
>
> And when I run this with data that produce a lot of warnings (hundreds), I
> get 

Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-28 Thread DmitriR
Hi Brian, 

First off, now I can capture the warnings, so for practical purposes my 
question has been addressed, thank you for helping me get to this point. 

Cool trick with StringIO. I can even just do:

Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0, Firefox

```
import io
err = sys.stderr
sys.stderr = io.StringIO()

# capture errors/warnings
Chem.MolFromSmiles('C1CC')

msgs = sys.stderr.getvalue()
sys.stderr = err
print('Captured', msgs)

# now errors show in the notebook again
Chem.MolFromSmiles('C1CC')
```

==

However, if you feel like digging a bit deeper, I'm a little confused too now :)

What is the scope of WrapLogs() effects? (notebook-wide, or cell?) Or, by 
chance, does it set anything really persistent?

In my prior notebook session, prior to trying WrapLogs() I could already see 
the warnings printed on red background (like in your screenshot, except that 
you have an ERROR msg, not WARNINGs as in my example). 

A call to WrapLogs() made warnings apparently disappear from the notebook. 

Upon reinitializing the session I could see the warnings on red background as 
before, wrote the code snippet in my prior email, and *without calling 
WrapLogs()* I could capture the warnings with it. 

So I assumed that RDKit messages went to the notebook's stderr by default, and 
WrapLogs() did something else. 

After getting your last email, I made a minimal test case (new notebook with 
just the RDKit call that generates warnings `dff['InChI'] = 
dff['ROMol'].map(Chem.MolToInchi)`, wrapped inside the stderr capture code 
snippet), killed all python instances, restarted the browser, loaded data from 
pickled dataframe. 

Now, *without ever having called WrapLogs()* I still get all RDKit warnings go 
to stderr, and I can still  capture them using the snippet. Calls to WrapLogs() 
now appear to have no effect whatsoever. 

If this indicates to you any potential issue, we can look more into it. 
Otherwise I'm good. 

==

The other strange behavior that I described below (the number of warnings 
alternating between successive calls to the same code using Chem.MolToInchi) 
remains though. Maybe it's the underlying InChi code, I did not investigate. 

Thanks again.
Dmitri



> On Jun 28, 2016, at 2:14 PM, Brian Kelley  wrote:
> 
> Dmitri,
>   I admit to being a bit confused.  What WrapLogs() does is simply redirect 
> the C++ errors into python's stderr. See attache png.   I think you may have 
> noticed that, as you are capturing with sys.stderr.
> 
> These errors are output (at least for me) in the IPython notebook.  I'm not 
> sure what is being hidden here.  Perhaps the notebook has changed somehow?  
> Here is my version:
> 
> Python 2.7.11 |Anaconda 2.1.0 (x86_64)| (default, Dec  6 2015, 18:57:58) 
> Type "copyright", "credits" or "license" for more information.
> 
> IPython 4.0.0 -- An enhanced Interactive Python.
> 
> 
> btw - you can use StringIO as opposed to a file
> 
> from StringIO import StringIO
> 
> err = sys.stderr
> io = sys.stderr = StringIO()
> 
> sys.stderr = err
> print io.getvalue()
> 
> 
> 
> On Tue, Jun 28, 2016 at 1:24 PM, DmitriR  wrote:
> Brian - Thank you!
> 
> (on OSX 10.11.5, jupyter 4.1.0)
> 
> rdkit.Chem.WrapLogs() does hide the messages.
> I could not figure out how to access them though once they are hidden.
> 
> To capture warnings, this mechanism seems to work - but it is ugly.
> 
> ```
> import os
> ## switch the streams
> stderr_fn = 'stderr.log'
> orig_stderr = sys.stderr
> sys.stderr = open(stderr_fn, 'w')
> 
> ## RDKit code producing warnings goes here
> 
> ## switch back stderr, process the warnings
> sys.stderr.flush()
> sys.stderr = orig_stderr
> with open(stderr_fn, 'r') as f: err_data = f.read()
> os.remove(stderr_fn)
> print(len(err_data))
> ```
> 
> Assuming it is all even necessary, this could be made much nicer by using a 
> context manager/decorator to handle stderr capture and return the warnings 
> text in an extra argument, along the lines of
> http://stackoverflow.com/questions/5136611/capture-stdout-from-a-script-in-python
> 
> ==
> 
> But also I noticed something weird:
> 
> If I re-run the notebook cell with code that produces warnings, I get *no 
> warnings* every third or sometimes second invocation.
> 
> And when I run this with data that produce a lot of warnings (hundreds), I 
> get different number of warnings between runs, at least with this call:
> 
> ```
> #dff is a pandas dataframe
> dff['InChI'] = dff['ROMol'].map(Chem.MolToInchi)
> ```
> 
> it cycles higher-number -> lower-number -> higher-number ... Not sure what to 
> make of it. Something screwed up with my system?
> 
> Dmitri
> 
> 
> 
> > On Jun 28, 2016, at 8:24 AM, Brian Kelley  wrote:
> >
> > Dmitri,  if you import rdkit.Chem.Draw.IPythonConsole the c++ errors and 
> > warnings should be seen in IPython.  This doesn't appear to work on Windows 
> > yet, sadly.
> >
> > This is enabled by the 

Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-28 Thread DmitriR
Hi Greg - 

Thank you very much for the clear and detailed explanation!

(and, now that I have a chance to say this, thank you for putting the project 
together; being able to work with chemistry in the python notebook is great, 
and having hooks into pandas is really cool)

In this case I was basically just going through the example code and ran into 
some behaviors that I did not understand (and you kindly explained). So it's 
all clear now. Uppercase aromatic atoms in MCS output does appear to be a bug; 
Hs on aromatic nitrogens I'll need to fix manually or with a transform. 

==

Separately, on another thing that came up in my working through that data:

I'd like to add my 2cents-equivalent of vote toward a bit fuller control of 
warnings produced by the C++ backend. In that example's data I was getting a 
lot of (fully valid, I think) warnings about stereochemistry, but I could not 
do anything to catch or hide them - and in an ipython notebook, it can get less 
than tidy. I did see this mentioned in other threads, so I understand that 
logging is a known issue somewhere on the stack. For now I just clean up 
manually.

Thanks again!

Kind regards,
Dmitri



> On Jun 28, 2016, at 1:39 AM, Greg Landrum  wrote:
> 
> Hi Dmitri,
> 
> The results that come back from the MCS in that examples really describe 
> queries, not necessarily stable molecules or things that can be accurately 
> translated into SMILES.
> 
> I'll describe below what's going on to cause the error, but the more 
> important question is: what are you trying to do?
> 
> In this case there are two problems. One has to do with the aromatic bonds in 
> the SMILES coming from C atoms that are written as capital letters. Here's a 
> simplified version of your example:
> 
> In [11]: Chem.MolFromSmiles('O=C1:[NH]:C:N:N2:C:*:C:C:1:2')
> [06:43:37] Explicit valence for atom # 1 C, 5, is greater than permitted
> 
> If I rewrite the SMILES to have the atoms with aromatic bonds written with 
> lower case letters everything is fine:
> 
> In [12]: Chem.MolFromSmiles('O=c1:[nH]:c:n:n2:c:*:c:c:1:2')
> Out[12]: 
> 
> This shouldn't make a difference in SMILES, so I'm inclined to think that 
> it's a bug.
> 
> The second problem was the missing hydrogen specification on the aromatic 
> nitrogen that has an H (I fixed this in the SMILES above). Since the RDKit 
> does not attempt to guess at chemistry, the general rule is that aromatic 
> heteroatoms should have Hs specified if they have any. There have been a 
> number of mailing list threads on this topic.
> 
> Best,
> -greg
> 
> 
> 
> 
> On Mon, Jun 27, 2016 at 8:26 PM, DmitriR  wrote:
> Dear RDKitters, 
> 
> I would appreciate any comments on the following:
> 
> I am looking at the 'SureChEMBL iPython Notebook Tutorial' 
> http://nbviewer.jupyter.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb
> 
> following along with rdkit '2016.03.1' on OSX 
> 
> In Cell 142, there is this SMILES: 
> 
> MCS SMILES: O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2
> This is a representation of a generalized structure, not any particular 
> molecule.
> 
> It was generated with Chem.MolToSmiles(mcsM,isomericSmiles=True) 
> 
> But when I try 
> Chem.MolFromSmiles('O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2')
> 
> I get "RDKit ERROR: [14:11:32] Explicit valence for atom # 1 C, 5, is greater 
> than permitted"
> 
> So there is no "round-trip" possible here. 
> 
> Which behavior is "correct", given the aromaticity and structure as specified?
> Should this be rendering/creating molecule, or failing?
> 
> Thanks!
> 
> (MarvinSketch does display the SMILES without complaints.;
> image is attached)
> 
> Dmitri
> 
> 
> 
> 
> --
> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 

--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-27 Thread Greg Landrum
Hi Dmitri,

The results that come back from the MCS in that examples really describe
queries, not necessarily stable molecules or things that can be accurately
translated into SMILES.

I'll describe below what's going on to cause the error, but the more
important question is: what are you trying to do?

In this case there are two problems. One has to do with the aromatic bonds
in the SMILES coming from C atoms that are written as capital letters.
Here's a simplified version of your example:

In [11]: Chem.MolFromSmiles('O=C1:[NH]:C:N:N2:C:*:C:C:1:2')
[06:43:37] Explicit valence for atom # 1 C, 5, is greater than permitted

If I rewrite the SMILES to have the atoms with aromatic bonds written with
lower case letters everything is fine:

In [12]: Chem.MolFromSmiles('O=c1:[nH]:c:n:n2:c:*:c:c:1:2')
Out[12]: 

This shouldn't make a difference in SMILES, so I'm inclined to think that
it's a bug.

The second problem was the missing hydrogen specification on the aromatic
nitrogen that has an H (I fixed this in the SMILES above). Since the RDKit
does not attempt to guess at chemistry, the general rule is that aromatic
heteroatoms should have Hs specified if they have any. There have been a
number of mailing list threads on this topic.

Best,
-greg




On Mon, Jun 27, 2016 at 8:26 PM, DmitriR  wrote:

> Dear RDKitters,
>
> I would appreciate any comments on the following:
>
> I am looking at the 'SureChEMBL iPython Notebook Tutorial'
>
> http://nbviewer.jupyter.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb
>
> following along with rdkit '2016.03.1' on OSX
>
> In Cell 142, there is this SMILES:
>
> MCS SMILES: O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2
> This is a representation of a generalized structure, not any
> particular molecule.
>
> It was generated with Chem.MolToSmiles(mcsM,isomericSmiles=True)
>
> But when I try
> Chem.MolFromSmiles('O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2')
>
> I get "RDKit ERROR: [14:11:32] Explicit valence for atom # 1 C, 5, is
> greater than permitted"
>
> So there is no "round-trip" possible here.
>
> Which behavior is "correct", given the aromaticity and structure as
> specified?
> Should this be rendering/creating molecule, or failing?
>
> Thanks!
>
> (MarvinSketch does display the SMILES without complaints.;
> image is attached)
>
> Dmitri
>
>
>
>
> --
> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-27 Thread DmitriR
Dear RDKitters, 

I would appreciate any comments on the following:

I am looking at the 'SureChEMBL iPython Notebook Tutorial' 
http://nbviewer.jupyter.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb

following along with rdkit '2016.03.1' on OSX 

In Cell 142, there is this SMILES: 

MCS SMILES: O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2
This is a representation of a generalized structure, not any particular 
molecule.

It was generated with Chem.MolToSmiles(mcsM,isomericSmiles=True) 

But when I try 
Chem.MolFromSmiles('O=C1:N:C(C2:C:C:C:C:C:2):N:N2:C:[*]:C:C:1:2')

I get "RDKit ERROR: [14:11:32] Explicit valence for atom # 1 C, 5, is greater 
than permitted"

So there is no "round-trip" possible here. 

Which behavior is "correct", given the aromaticity and structure as specified?
Should this be rendering/creating molecule, or failing?

Thanks!

(MarvinSketch does display the SMILES without complaints.;
image is attached)

Dmitri


--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss