Re: [Rdkit-discuss] How to transform SMARTS of aromatic structures so that their aromatic atoms could be any?

2017-05-19 Thread Alexis Parenty
Hi Christos, thank you so much!

Your approach is much simpler and quicker than what I had, and it now works
with polycyclic compounds. I did try your approach at first but I could not
have an image representation in ChemDraw of the SMARTS I was creating with
the "a" labels. I thought I was doing something wrong and thought the only
way was to use the more complicated “:[*]” notation… Your script provides
valid SMARTS even if ChemDraw does not recognize them. You saved me a lot
of time.

Thanks again,

Alexis

On 19 May 2017 at 14:38, Christos Kannas  wrote:

> Hi Alexis,
>
> In SMARTS you can define an aromateic atom with "a".
> So I'm thinking that something like the following, might produce more
> correct generalised SMARTS patterns.
>
> https://gist.github.com/CKannas/7a9e2768461260461155257fd30c2152
>
> *Note: Please check if the chemistry is correct.*
>
> Best,
>
> Christos
>
> Christos Kannas
>
> Researcher
> Ph.D Student
>
> [image: View Christos Kannas's profile on LinkedIn]
> 
>
> On 19 May 2017 at 12:52, Alexis Parenty 
> wrote:
>
>> Hi everyone,
>>
>>
>> I need a function that could generalize any aromatic rings from a SMARTS:
>>
>> [image: Inline images 1]
>>
>>
>> I have noticed that it is possible to rearrange most of SMARTS strings
>> into a general aromatic SMARTS strings by following those simple rules:
>>
>> 1 Exchange any lower case of a SMARTS string with
>> “:[*]”
>>
>> 2 Catch the two cycle junctions of the SMARTS:
>>
>> a.   Where a number(1-9) appears a first time in the string: insert
>> a colon after the digit (for example “[*]1” to “[*]1:”
>>
>> b.  Where the same number appears a second time, move the semi colon
>> before the digit (for example “[*]1:” to “[*]:1 the
>>
>>
>> I have written a function (see under) that works fine with any SMART
>> containing a single aromatic ring. But it does get buggy when I have a
>> SMARTS with more than one aromatic ring:
>>
>>
>>
>> [image: Inline images 2]
>>
>>
>>
>> def get_aromatic_generalised_smarts(smarts):
>>for arom_atom in ("c", "o", "n", "s"):
>>   smarts = smarts.replace(arom_atom, "x")
>>smarts = smarts.replace("[xH]", "x") # to take care of explicit hydrogen 
>> atoms
>>
>>for char in smarts:
>>   if char == 'x':
>>  smarts = smarts.replace(char, ":[*]")
>>
>>for char in smarts:
>>   if char.isdigit():
>>  if ("[*]"+char) in smarts:
>> for cycle_junction in ("[*]1", "[*]2", "[*]3", "[*]4", "[*]5", 
>> "[*]6", "[*]7", "[*]8", "[*]9"):
>>smarts = smarts.replace(cycle_junction, "[*]:" + 
>> cycle_junction[-1])   # that make the second cycle junction OK but introduce 
>> an error in the first cycle jonction that is corrected next line
>> smarts = smarts.replace(":[*]:"+char, "[*]"+char, 1) # to 
>> correct the first cycle junction.
>> break
>>return smarts
>>
>>
>> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)n1"))
>> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1"))
>>
>> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1Cc2c2")
>>
>>
>> Am I heading in the right direction? I can't make my heads around SMARTS
>> with more than one aromatic rings...
>>
>> Maybe regular expressions would be more appropriate? Maybe there is an
>> RDKit function that does the trick from a mol object?
>>
>>
>> Thanks,
>>
>>
>> Alexis
>>
>>
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to transform SMARTS of aromatic structures so that their aromatic atoms could be any?

2017-05-19 Thread Christos Kannas
Hi Alexis,

In SMARTS you can define an aromateic atom with "a".
So I'm thinking that something like the following, might produce more
correct generalised SMARTS patterns.

https://gist.github.com/CKannas/7a9e2768461260461155257fd30c2152

*Note: Please check if the chemistry is correct.*

Best,

Christos

Christos Kannas

Researcher
Ph.D Student

[image: View Christos Kannas's profile on LinkedIn]


On 19 May 2017 at 12:52, Alexis Parenty 
wrote:

> Hi everyone,
>
>
> I need a function that could generalize any aromatic rings from a SMARTS:
>
> [image: Inline images 1]
>
>
> I have noticed that it is possible to rearrange most of SMARTS strings
> into a general aromatic SMARTS strings by following those simple rules:
>
> 1 Exchange any lower case of a SMARTS string with
> “:[*]”
>
> 2 Catch the two cycle junctions of the SMARTS:
>
> a.   Where a number(1-9) appears a first time in the string: insert a
> colon after the digit (for example “[*]1” to “[*]1:”
>
> b.  Where the same number appears a second time, move the semi colon
> before the digit (for example “[*]1:” to “[*]:1 the
>
>
> I have written a function (see under) that works fine with any SMART
> containing a single aromatic ring. But it does get buggy when I have a
> SMARTS with more than one aromatic ring:
>
>
>
> [image: Inline images 2]
>
>
>
> def get_aromatic_generalised_smarts(smarts):
>for arom_atom in ("c", "o", "n", "s"):
>   smarts = smarts.replace(arom_atom, "x")
>smarts = smarts.replace("[xH]", "x") # to take care of explicit hydrogen 
> atoms
>
>for char in smarts:
>   if char == 'x':
>  smarts = smarts.replace(char, ":[*]")
>
>for char in smarts:
>   if char.isdigit():
>  if ("[*]"+char) in smarts:
> for cycle_junction in ("[*]1", "[*]2", "[*]3", "[*]4", "[*]5", 
> "[*]6", "[*]7", "[*]8", "[*]9"):
>smarts = smarts.replace(cycle_junction, "[*]:" + 
> cycle_junction[-1])   # that make the second cycle junction OK but introduce 
> an error in the first cycle jonction that is corrected next line
> smarts = smarts.replace(":[*]:"+char, "[*]"+char, 1) # to correct 
> the first cycle junction.
> break
>return smarts
>
>
> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)n1"))
> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1"))
>
> print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1Cc2c2")
>
>
> Am I heading in the right direction? I can't make my heads around SMARTS
> with more than one aromatic rings...
>
> Maybe regular expressions would be more appropriate? Maybe there is an
> RDKit function that does the trick from a mol object?
>
>
> Thanks,
>
>
> Alexis
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to transform SMARTS of aromatic structures so that their aromatic atoms could be any?

2017-05-19 Thread Alexis Parenty
Hi everyone,


I need a function that could generalize any aromatic rings from a SMARTS:

[image: Inline images 1]


I have noticed that it is possible to rearrange most of SMARTS strings into
a general aromatic SMARTS strings by following those simple rules:

1 Exchange any lower case of a SMARTS string with “:[*]”

2 Catch the two cycle junctions of the SMARTS:

a.   Where a number(1-9) appears a first time in the string: insert a
colon after the digit (for example “[*]1” to “[*]1:”

b.  Where the same number appears a second time, move the semi colon
before the digit (for example “[*]1:” to “[*]:1 the


I have written a function (see under) that works fine with any SMART
containing a single aromatic ring. But it does get buggy when I have a
SMARTS with more than one aromatic ring:



[image: Inline images 2]



def get_aromatic_generalised_smarts(smarts):
   for arom_atom in ("c", "o", "n", "s"):
  smarts = smarts.replace(arom_atom, "x")
   smarts = smarts.replace("[xH]", "x") # to take care of explicit
hydrogen atoms

   for char in smarts:
  if char == 'x':
 smarts = smarts.replace(char, ":[*]")

   for char in smarts:
  if char.isdigit():
 if ("[*]"+char) in smarts:
for cycle_junction in ("[*]1", "[*]2", "[*]3", "[*]4",
"[*]5", "[*]6", "[*]7", "[*]8", "[*]9"):
   smarts = smarts.replace(cycle_junction, "[*]:" +
cycle_junction[-1])   # that make the second cycle junction OK but
introduce an error in the first cycle jonction that is corrected next
line
smarts = smarts.replace(":[*]:"+char, "[*]"+char, 1) # to
correct the first cycle junction.
break
   return smarts


print(get_aromatic_generalised_smarts("[*]c1coc(Cl)n1"))
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1"))

print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1Cc2c2")


Am I heading in the right direction? I can't make my heads around SMARTS
with more than one aromatic rings...

Maybe regular expressions would be more appropriate? Maybe there is an
RDKit function that does the trick from a mol object?


Thanks,


Alexis
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss