Re: [Rdkit-discuss] mol_from_ctab doesn't preserve coordinates

2020-05-10 Thread Jan Holst Jensen

Hi Sharang (adding the list as I missed reply-all previously),

Glad to hear that RDKit does what it is supposed to.

I am unfortunately not familiar with how Datawarrior processes 
molecules, so I don't know if it does its own layout. To remove RDKit 
from the equation - if you can feed Datawarrior a molfile directly from 
disk, you can see if Datawarrior preserves the layout of the molfile or not.


Cheers
-- Jan

On 2020-05-09 16:32, Sharang Phatak wrote:

Hi Jan,

Thank you for your message. I did go back and check with mol_to_ctab 
and found out the coordinates are indeed identical. I am using 
Datawarrior to visualize these structures using Datawarrior's native 
SQL integration tool. The structures then are displayed a bit 
differently, flipped / rotated within this tool. Perhaps it's a 
different issue unrelated to RDKit. If you've experienced such issues 
and share what you did to overcome them, it would be much appreciated.


Cheers,
Sharang

On Thu, May 7, 2020 at 3:22 AM Jan Holst Jensen > wrote:


Hi Sharang,

A very old version of RDKit ?

When I use your form of calling mol_to_ctab() it does preserve
coordinates for me.

select mol_to_ctab(mol_from_ctab('


   4  3  0  0  0  0  0  0  1  0999 V2000
 5.    5.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 5.    4.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 6.    3.    0. N   0  0  0  0  0  0  0 0  0 0  0  0
 4.    3.    0. O   0  0  0  0  0  0  0 0  0 0  0  0
   2  1  1  1  0  0  0
   2  3  1  0  0  0  0
   2  4  1  0  0  0  0
M  END
'::cstring, true));


  RDKit  3D

   4  3  0  0  0  0  0  0  0  0999 V2000
 5.    5.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 5.    4.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 6.    3.    0. N   0  0  0  0  0  0  0 0  0 0  0  0
 4.    3.    0. O   0  0  0  0  0  0  0 0  0 0  0  0
   2  1  1  1
   2  3  1  0
   2  4  1  0
M  END


If I leave out the second optional boolean parameter, which
defaults to
false, the coordinates are re-generated by RDKit.

select mol_to_ctab(mol_from_ctab('


   4  3  0  0  0  0  0  0  1  0999 V2000
 5.    5.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 5.    4.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 6.    3.    0. N   0  0  0  0  0  0  0 0  0 0  0  0
 4.    3.    0. O   0  0  0  0  0  0  0 0  0 0  0  0
   2  1  1  1  0  0  0
   2  3  1  0  0  0  0
   2  4  1  0  0  0  0
M  END
'::cstring));


  RDKit  2D

   4  3  0  0  0  0  0  0  0  0999 V2000
 0.    0.    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 1.2990    0.7500    0. C   0  0  0  0  0  0  0 0  0 0  0  0
 2.5981   -0.    0. N   0  0  0  0  0  0  0 0  0 0  0  0
 1.2990    2.2500    0. O   0  0  0  0  0  0  0 0  0 0  0  0
   2  1  1  6
   2  3  1  0
   2  4  1  0
M  END

This is on a fairly old RDKit 2016_09_4 on Postgres 9.6. Earlier
versions would ignore the second parameter - that was fixed around
the
2016_09 release if I recall correctly.

Cheers
-- Jan Holst Jensen

On 2020-05-07 00:21, Sharang Phatak wrote:
> Hi,
>
> I am following the documentation for postgres / rdkit. I have a
table
> with valid molfiles as confirmed from is_valid_ctab(). I am then
> trying to insert into a table 'mols' using
> mol_from_ctab(molfile::cstring,true).
>
> However, the coordinates are not preserved. Is there something I am
> missing?
>
> Thank you,
> Sharang





smime.p7s
Description: S/MIME Cryptographic Signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-10 Thread Paolo Tosco

Dear Quoc-Tuan,

I think I have come with a reasonably fast algorithm that seems to be 
more robust:


https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 06/05/2020 09:11, Quoc-Tuan DO wrote:

Dear Paolo,

Thank you again for your code. Sorry for bothering you again. It works 
all fine for monoterpenes but not for diterpenes, sesquiterpenes nor 
triterpenes.


pattern: C~C~C(~C)~C

mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C

=> ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))

It should find 4 distinct units.

mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C

=> ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))

It should find 6 distinct units.

I tried with a smarts version of the pattern 
[#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.


What do you think? Is there something missing in the query?

Thanks for your time,

Best regards,

QT



Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.






___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss