Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-11 Thread Greenpharma S.A.S.
Dear Paolo,
Thank you very much. I'll test this and revert to you.
Have a nice day.
Best regards,
Quoc-Tuan

> Le 10 mai 2020 à 13:09, Paolo Tosco  mailto:paolo.tosco.m...@gmail.com > a écrit :
> 
> 
> Dear Quoc-Tuan,
> 
> I think I have come with a reasonably fast algorithm that seems to be
> more robust:
> 
> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> 
> Cheers,
> p.
> 
> On 06/05/2020 09:11, Quoc-Tuan DO wrote:
> 
> > > Dear Paolo,
> > 
> > > 
> > > Thank you again for your code. Sorry for bothering you again. It 
> works
> > all fine for monoterpenes but not for diterpenes, sesquiterpenes nor
> > triterpenes.
> > 
> > > 
> > > pattern: C~C~C(~C)~C
> > 
> > > 
> > > mol1: 
> CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C
> > 
> > > 
> > > => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))
> > 
> > > 
> > > It should find 4 distinct units.
> > 
> > > 
> > > mol2: 
> OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C
> > 
> > > 
> > > => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))
> > 
> > > 
> > > It should find 6 distinct units.
> > 
> > > 
> > > I tried with a smarts version of the pattern
> > [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.
> > 
> > > 
> > > What do you think? Is there something missing in the query?
> > 
> > > 
> > > Thanks for your time,
> > 
> > > 
> > > Best regards,
> > 
> > > 
> > > QT
> > 
> > > >
> >
> 
> > > Le 05/05/2020 à 14:52, Paolo Tosco a écrit :
> > >
> > 
> > > >> Dear Quoc-Tuan,
> >>
> >> this should do what you need:
> >>
> >> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> >>
> >> Cheers,
> >> p.
> >>
> >
> 
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Greenpharma S.A.S.
Dear All,

Please could you help with the following problem (I could not find answers in 
discussion list) ?

pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 
5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), 
(2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 
2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), 
(7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 
5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), 
(9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 
2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))


I expect to have only 2 matches with uniquify=True as I only have 2 units of 
the pattern. Furthermore, with or without uniquify, I have the same answers. I 
also expected that there should be 2 "independent" lists but here, there is 
always at least one common atom between each list.

Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss