Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-11 Thread Greenpharma S.A.S.
Dear Paolo,
Thank you very much. I'll test this and revert to you.
Have a nice day.
Best regards,
Quoc-Tuan

> Le 10 mai 2020 à 13:09, Paolo Tosco  mailto:paolo.tosco.m...@gmail.com > a écrit :
> 
> 
> Dear Quoc-Tuan,
> 
> I think I have come with a reasonably fast algorithm that seems to be
> more robust:
> 
> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> 
> Cheers,
> p.
> 
> On 06/05/2020 09:11, Quoc-Tuan DO wrote:
> 
> > > Dear Paolo,
> > 
> > > 
> > > Thank you again for your code. Sorry for bothering you again. It 
> works
> > all fine for monoterpenes but not for diterpenes, sesquiterpenes nor
> > triterpenes.
> > 
> > > 
> > > pattern: C~C~C(~C)~C
> > 
> > > 
> > > mol1: 
> CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C
> > 
> > > 
> > > => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))
> > 
> > > 
> > > It should find 4 distinct units.
> > 
> > > 
> > > mol2: 
> OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C
> > 
> > > 
> > > => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))
> > 
> > > 
> > > It should find 6 distinct units.
> > 
> > > 
> > > I tried with a smarts version of the pattern
> > [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.
> > 
> > > 
> > > What do you think? Is there something missing in the query?
> > 
> > > 
> > > Thanks for your time,
> > 
> > > 
> > > Best regards,
> > 
> > > 
> > > QT
> > 
> > > >
> >
> 
> > > Le 05/05/2020 à 14:52, Paolo Tosco a écrit :
> > >
> > 
> > > >> Dear Quoc-Tuan,
> >>
> >> this should do what you need:
> >>
> >> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> >>
> >> Cheers,
> >> p.
> >>
> >
> 
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-10 Thread Paolo Tosco

Dear Quoc-Tuan,

I think I have come with a reasonably fast algorithm that seems to be 
more robust:


https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 06/05/2020 09:11, Quoc-Tuan DO wrote:

Dear Paolo,

Thank you again for your code. Sorry for bothering you again. It works 
all fine for monoterpenes but not for diterpenes, sesquiterpenes nor 
triterpenes.


pattern: C~C~C(~C)~C

mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C

=> ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))

It should find 4 distinct units.

mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C

=> ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))

It should find 6 distinct units.

I tried with a smarts version of the pattern 
[#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.


What do you think? Is there something missing in the query?

Thanks for your time,

Best regards,

QT



Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.






___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-06 Thread Quoc-Tuan DO

Dear Paolo,

Thank you again for your code. Sorry for bothering you again. It works 
all fine for monoterpenes but not for diterpenes, sesquiterpenes nor 
triterpenes.


pattern: C~C~C(~C)~C

mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C

=> ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))

It should find 4 distinct units.

mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C

=> ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))

It should find 6 distinct units.

I tried with a smarts version of the pattern [#6]~[#6]~[#6](~[#6])~[#6], 
but got the same results as with smiles.


What do you think? Is there something missing in the query?

Thanks for your time,

Best regards,

QT



Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Jean-Marc Nuzillard

Dear Paolo,

this answers my question as well, but in an unexpected way.

Best,

Jean-Marc


Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 05/05/2020 11:52, Quoc-Tuan DO wrote:


Dear Paolo,

Thank you for your reply.

I understand now... I did not use uniquify option first then only 
uniquify=True. I thought the default would be uniquify=False.


Actually my problem is to find 2 distinct units of isoprene (pattern) 
in the borneol (smiles) as the latter is a monoterpene.


Do you have any idea I can do this ?

Thanks in advance for your time.

Best regards,

QT



Le 04/05/2020 à 19:53, Paolo Tosco a écrit :


Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 
10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 
5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 
5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 
4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 
5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), 
(8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 
6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 
7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 
5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as 
many answers. This time, matches can be constitued by the same 
indices, provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but 
here, there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Paolo Tosco

Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 05/05/2020 11:52, Quoc-Tuan DO wrote:


Dear Paolo,

Thank you for your reply.

I understand now... I did not use uniquify option first then only 
uniquify=True. I thought the default would be uniquify=False.


Actually my problem is to find 2 distinct units of isoprene (pattern) 
in the borneol (smiles) as the latter is a monoterpene.


Do you have any idea I can do this ?

Thanks in advance for your time.

Best regards,

QT



Le 04/05/2020 à 19:53, Paolo Tosco a écrit :


Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 
10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 
9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 
1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 
3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 
4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 
3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 
4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 
4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as 
many answers. This time, matches can be constitued by the same 
indices, provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but here, 
there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Quoc-Tuan DO

  
  
Dear Paolo,
Thank you for your reply.
I understand now... I did not use uniquify option first then only
  uniquify=True. I thought the default would be uniquify=False.
Actually my problem is to find 2 distinct units of isoprene
  (pattern) in the borneol (smiles) as the latter is a monoterpene.
Do you have any idea I can do this ?
Thanks in advance for your time.
Best regards,
QT
  





Le 04/05/2020 à 19:53, Paolo Tosco a
  écrit :


  
  Dear Quoc-Tuan,
  On 04/05/2020 09:10, Greenpharma S.A.S. wrote:
  


Dear All,

Please could you help with the following problem (I could not
  find answers in discussion list) ?

pattern='C~C~C(~C)~C'
smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'



 pat = Chem.MolFromSmiles(pattern)
  mol = Chem.MolFromSmiles(smiles)
  res = mol.GetSubstructMatches(pat, uniquify=True)



The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5,
  4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7),
  (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5,
  1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6,
  5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3,
  9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8,
  3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1,
  4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4,
  5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8),
  (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))


I expect to have only 2 matches with uniquify=True as I only
  have 2 units of the pattern.
  
  GetSubstructMatches() will report all matches of the
pattern against your molecule. In your case, there are 35
matches which are all constituted by different atom indices.
  
Furthermore, with or without uniquify, I have the same
  answers.
  
  If you set uniquify=False, you actually get 70
matches, so twice as many answers. This time, matches can be
constitued by the same indices, provided they are in a different
permutation.
  I have uploaded a gist here:
  https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d
  that hopefully will make this clearer.
  Cheers,
p.
  
  
I also expected that there should be 2 "independent" lists
  but here, there is always at least one common atom between
  each list.

Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

  



  


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Jean-Marc Nuzillard

Dear Quoc-Tuan,

GetSubstructMatches() tries to find isoprene at all positions where this 
is possible.


You may want to test your SMARTS and its matching with structures at 
this great place:

https://smartsview.zbh.uni-hamburg.de/

Maybe you would prefer to known whether borneol
follows the isoprene rule or not by trying to cover its structure
with two, unbound, isoprene units.
I really would like to know how to write that with SMARTS.

Jean-Marc


Le 04/05/2020 à 10:10, Greenpharma S.A.S. a écrit :


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), 
(2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), 
(2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), 
(3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), 
(6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), 
(7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), 
(8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), 
(9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), 
(10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern. Furthermore, with or without uniquify, I have 
the same answers. I also expected that there should be 2 "independent" 
lists but here, there is always at least one common atom between each 
list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Paolo Tosco

Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), 
(2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), 
(2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), 
(3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), 
(6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), 
(7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), 
(8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), 
(9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), 
(10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as many 
answers. This time, matches can be constitued by the same indices, 
provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but here, 
there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Greenpharma S.A.S.
Dear All,

Please could you help with the following problem (I could not find answers in 
discussion list) ?

pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), (2, 1, 
5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), 
(2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 
2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), 
(7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 
5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), 
(9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 
2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))


I expect to have only 2 matches with uniquify=True as I only have 2 units of 
the pattern. Furthermore, with or without uniquify, I have the same answers. I 
also expected that there should be 2 "independent" lists but here, there is 
always at least one common atom between each list.

Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches() as smiles

2019-08-07 Thread Andrew Dalke
On Aug 7, 2019, at 13:08, Paolo Tosco  wrote:
> You can use
> 
> Chem.MolFragmentToSmiles(mol, match)
> 
> where match is a tuple of atom indices returned by GetSubstructMatch().

Note however that if only the atom indices are given then 
Chem.MolFragmentToSmiles() will include all bonds which connect those atoms, 
even if the original SMARTS does not match those bonds. For example:

>>> from rdkit import Chem
>>> pat = Chem.MolFromSmarts("*~*~*~*") # match 4 linear atoms
>>> mol = Chem.MolFromSmiles("C1CCC1") # ring of size 4
>>> atom_indices = mol.GetSubstructMatch(pat)
>>> atom_indices
(0, 1, 2, 3)
>>> Chem.MolFragmentToSmiles(mol, atom_indices)  # returns the ring
'C1CCC1'


If this is important, then you need to pass the correct bond indices to 
MolFragmentToSmiles(). This can be done by using the bonds in the query graph 
to get the bond indices in the molecule graph. I believe the following is 
correct:

def get_match_bond_indices(query, mol, match_atom_indices):
bond_indices = []
for query_bond in query.GetBonds():
atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()]
atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()]
bond_indices.append(mol.GetBondBetweenAtoms(
 atom_index1, atom_index2).GetIdx())
return bond_indices

(Does a function like this already exist in RDKit?)

I'll use it to get the bond indices for the *~*~*~* match:

>>> bond_indices = get_match_bond_indices(pat, mol, atom_indices)
>>> bond_indices
[0, 1, 2]

Passing the atom and bond indices gives the expected match SMILES: 

>>> Chem.MolFragmentToSmiles(mol, atom_indices, bond_indices)
''

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches() as smiles

2019-08-07 Thread Paolo Tosco
Hi Mel,

You can use

Chem.MolFragmentToSmiles(mol, match)

where match is a tuple of atom indices returned by GetSubstructMatch().

Cheers,
p.

> On 7 Aug 2019, at 11:36, Melissa Adasme  wrote:
> 
> Dear rdkitters,
> 
> I'm trying to find substructures (query molecules built from SMARTS) matching 
> my molecules (SMILES). I found the GetSubstructMatches() method which works 
> pretty well returning the indices of matching atoms in my molecule. 
> 
> I wonder if there is a way to directly obtain the SMILES of the found 
> substructures instead of the atom indexes or maybe a way to transform the 
> indexes to smiles?
> 
> Many thanks in advance!
> Mel
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches() as smiles

2019-08-07 Thread Melissa Adasme
Dear rdkitters,

I'm trying to find substructures (query molecules built from SMARTS)
matching my molecules (SMILES). I found the GetSubstructMatches() method
which works pretty well returning the indices of matching atoms in my
molecule.

I wonder if there is a way to directly obtain the SMILES of the found
substructures instead of the atom indexes or maybe a way to transform the
indexes to smiles?

Many thanks in advance!
Mel
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches

2016-12-14 Thread Greg Landrum
Hi Jean-Marc,

The answer is in the error message, once you know how to read it, which
isn't really trivial:

On Wed, Dec 14, 2016 at 5:35 PM, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

>
> Traceback (most recent call last):
>File "glmap.py", line 11, in 
>  matches = mol.GetSubstructMatches(skel)
> Boost.Python.ArgumentError: Python argument types in
>  Mol.GetSubstructMatches(Mol, str)
> did not match C++ signature:
>  GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol
> query, bool uniquify=True, bool useChirality=False, bool
> useQueryQueryMatches=False, unsigned int maxMatches=1000)
>

It's telling you that you called Mol.GetSubstructMatches was called with a
Mol and a string  (the "Mol" is the object you're calling "mol" and the
string is the  object you are calling "skel"). It expects, however, to be
called with a Mol and a Mol.

If you convert skel into an RDKit molecule everything should work.

-greg
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches

2016-12-14 Thread Jean-Marc Nuzillard

Sure, it works!

Thanks, Greg.

Jean-Marc

Le 14/12/2016 à 17:43, Greg Landrum a écrit :

Hi Jean-Marc,

The answer is in the error message, once you know how to read it, 
which isn't really trivial:


On Wed, Dec 14, 2016 at 5:35 PM, Jean-Marc Nuzillard 
> wrote:



Traceback (most recent call last):
   File "glmap.py", line 11, in 
 matches = mol.GetSubstructMatches(skel)
Boost.Python.ArgumentError: Python argument types in
 Mol.GetSubstructMatches(Mol, str)
did not match C++ signature:
 GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol
query, bool uniquify=True, bool useChirality=False, bool
useQueryQueryMatches=False, unsigned int maxMatches=1000)


It's telling you that you called Mol.GetSubstructMatches was called 
with a Mol and a string  (the "Mol" is the object you're calling "mol" 
and the string is the  object you are calling "skel"). It expects, 
however, to be called with a Mol and a Mol.


If you convert skel into an RDKit molecule everything should work.

-greg




--
Jean-Marc Nuzillard
Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches

2016-12-14 Thread Jean-Marc Nuzillard
Hi all,

I have encountered the following problem :

Traceback (most recent call last):
   File "glmap.py", line 11, in 
 matches = mol.GetSubstructMatches(skel)
Boost.Python.ArgumentError: Python argument types in
 Mol.GetSubstructMatches(Mol, str)
did not match C++ signature:
 GetSubstructMatches(class RDKit::ROMol self, class RDKit::ROMol 
query, bool uniquify=True, bool useChirality=False, bool 
useQueryQueryMatches=False, unsigned int maxMatches=1000)

trying to find substructure skel in molecule mol.
I use RDKit under Windows, using Anaconda python and packages.
Presently, I have rdkitVersion 2016.03.1 and boostVersion 1_56.

I get similar messages with GetSubstructMatch and HasSubstructMatch.

Any idea?

All the best,

Jean-Marc

-- 

Dr. Jean-Marc Nuzillard
Institute of Molecular Chemistry
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 33 3 26 91 82 10
Fax :33 3 26 91 31 66
http://www.univ-reims.fr/ICMR

http://eos.univ-reims.fr/LSD/
http://eos.univ-reims.fr/LSD/JmnSoft/


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches() and resonance structures

2014-10-31 Thread Paolo Tosco
Thanks to both for your replies. That's more or less what I was thinking of - I 
just wanted to make sure that there was not something already available before 
starting coding :-) I will get back to the list once I have something ready.

Cheers,
p.


 On 31 Oct 2014, at 05:17, Greg Landrum greg.land...@gmail.com wrote:
 
 The reply that Ling forwards has one approach to doing this.
 
 It's a bit easier for someone who is willing to do some C++ work.[1]
 
 One could imagine writing a function prepareForResonanceFormMatching(ROMol 
 m) (or some such thing) that would be applied to the *query* molecule that 
 does the following:
 - identifies the groups that need to be resonance-symmetrized
 - changes the resonance bonds to Query bonds that match single or double 
 (possibly also aromatic?)
 - neutralizes any charges on resonating atoms in the group. 
 
 The last step is important because the query C(O)O matches the molecule 
 C(O)[O-] twice, but C(O)[O-] only matches once:
 
 In [11]: 
 Chem.MolFromSmiles('C(O)[O-]').GetSubstructMatches(Chem.MolFromSmiles('C(O)O'),uniquify=False)
 Out[11]: ((0, 1, 2), (0, 2, 1))
 
 In [12]: 
 Chem.MolFromSmiles('C(O)[O-]').GetSubstructMatches(Chem.MolFromSmiles('C([O-])O'),uniquify=False)
  
 Out[12]: ((0, 2, 1),)
 
 I suspect such a function would be useful to multiple people.
 
 For identifying the groups that are resonance symmetrized: though this could 
 be done using a set of particular patterns, it may be better to think about 
 doing it more generally by having it find resonance systems.[2] The flag 
 Bond.getIsConjugated(), set during sanitization, is probably useful for this. 
  
 
 -greg
 [1] well, to the extent that anything is ever easier in C++
 [2] this would allow finding the substructure matches within molecules like 
 C1=C(C)C=CC=CC=C1
 
 
 On Fri, Oct 31, 2014 at 2:09 AM, S.L. Chan slch...@yahoo.com wrote:
 Dear Paolo,
 
 I have asked a very similar question last year. This was what Greg said.
 
 Ling
 
 Re: [Rdkit-discuss] atom equivalence for substructure matching
  
  
 
  
  
  
  
  
 Re: [Rdkit-discuss] atom equivalence for substructure ma...
 Skip to site navigation (Press enter)
 View on www.mail-archive.com
 Preview by Yahoo
  
 
 From: Paolo Tosco paolo.to...@unito.it
 To: rdkit-discuss@lists.sourceforge.net 
 rdkit-discuss@lists.sourceforge.net 
 Sent: Thursday, October 30, 2014 4:26 PM
 Subject: [Rdkit-discuss] GetSubstructMatches() and resonance structures
 
 Dear all,
 
 The following code snippet compares two resonance structures of formate 
 anion:
 
 import rdkit
 from rdkit import Chem
 
 mol1=Chem.MolFromSmiles('C([O-])=O')
 mol2=Chem.MolFromSmiles('C(=O)[O-]')
 mol1.GetSubstructMatches(mol2, uniquify = False)
 ((0, 2, 1),)
 
 mol1.GetSubstructMatches(mol1, uniquify = False)
 ((0, 1, 2),)
 
 I would rather like to get, in both cases, the following output:
 ((0, 1, 2),(0, 2, 1))
 
 which would account for the carboxylate group symmetry due to resonance. 
 The same applies to amidinium, guanidinium, etc.
 
 Is that currently feasible within the RDKit API?
 
 Thanks in advance, cheers
 Paolo
 
 
 --
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 --
 
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches() and resonance structures

2014-10-30 Thread Paolo Tosco
Dear all,

The following code snippet compares two resonance structures of formate 
anion:

import rdkit
from rdkit import Chem

mol1=Chem.MolFromSmiles('C([O-])=O')
mol2=Chem.MolFromSmiles('C(=O)[O-]')
mol1.GetSubstructMatches(mol2, uniquify = False)
((0, 2, 1),)

mol1.GetSubstructMatches(mol1, uniquify = False)
((0, 1, 2),)

I would rather like to get, in both cases, the following output:
((0, 1, 2),(0, 2, 1))

which would account for the carboxylate group symmetry due to resonance. 
The same applies to amidinium, guanidinium, etc.

Is that currently feasible within the RDKit API?

Thanks in advance, cheers
Paolo


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches()

2013-08-23 Thread S.L. Chan
Indeed with the latest release there is no problem any more.

Ling




 From: Greg Landrum greg.land...@gmail.com
To: S.L. Chan slch...@yahoo.com 
Cc: rdkit-discuss@lists.sourceforge.net 
rdkit-discuss@lists.sourceforge.net 
Sent: Thursday, August 22, 2013 6:57 PM
Subject: Re: [Rdkit-discuss] GetSubstructMatches()
 


Dear Ling,


On Thu, Aug 22, 2013 at 11:49 PM, S.L. Chan slch...@yahoo.com wrote:

Good afternoon folks,


I would imagine that if you remove the hydrogens, the
resulting molecule would be a substructure of the
original molecule. However, when I do the following
to the attached MDL mol file, there is no matches.


I would expect it to be a substructure.
 
 from rdkit import Chem
 mol = Chem.MolFromMolFile('temp.mol', removeHs=False)
 mhvy = Chem.RemoveHs(mol)
 matches = mol.GetSubstructMatches(mhvy)


matches turns out to be empty.


Which version of the RDKit are you using? I cannot reproduce this with the 
most recent release:


In [5]: m = Chem.MolFromMolFile('temp.mol',removeHs=False)


In [6]: mhvy = Chem.RemoveHs(m)


In [7]: len(m.GetSubstructMatches(mhvy))
Out[7]: 1


In [8]: from rdkit import rdBase


In [9]: rdBase.rdkitVersion
Out[9]: '2013.06.1'


-greg 

--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetSubstructMatches()

2013-08-22 Thread S.L. Chan
Good afternoon folks,

I would imagine that if you remove the hydrogens, the
resulting molecule would be a substructure of the
original molecule. However, when I do the following
to the attached MDL mol file, there is no matches.

 from rdkit import Chem
 mol = Chem.MolFromMolFile('temp.mol', removeHs=False)
 mhvy = Chem.RemoveHs(mol)
 matches = mol.GetSubstructMatches(mhvy)

matches turns out to be empty.

Is it something to do with the difference between Smarts
and Smiles? If so, how can I work around this to obtain
the atomic index relationship between the two molecules?

Thank you for your insight.

Ling


temp.mol
Description: Binary data
--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches()

2013-08-22 Thread Greg Landrum
Dear Ling,

On Thu, Aug 22, 2013 at 11:49 PM, S.L. Chan slch...@yahoo.com wrote:

 Good afternoon folks,

 I would imagine that if you remove the hydrogens, the
 resulting molecule would be a substructure of the
 original molecule. However, when I do the following
 to the attached MDL mol file, there is no matches.


I would expect it to be a substructure.


   from rdkit import Chem
  mol = Chem.MolFromMolFile('temp.mol', removeHs=False)
  mhvy = Chem.RemoveHs(mol)
  matches = mol.GetSubstructMatches(mhvy)

 matches turns out to be empty.


Which version of the RDKit are you using? I cannot reproduce this with the
most recent release:

In [5]: m = Chem.MolFromMolFile('temp.mol',removeHs=False)

In [6]: mhvy = Chem.RemoveHs(m)

In [7]: len(m.GetSubstructMatches(mhvy))
Out[7]: 1

In [8]: from rdkit import rdBase

In [9]: rdBase.rdkitVersion
Out[9]: '2013.06.1'

-greg
--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss