Hi Martin,
Following on from John’s suggestion, you might like to have a look at the
following class in the SMSD (new version).
https://github.com/asad/SMSD/blob/master/src/org/openscience/smsd/AtomAtomMapping.java
<https://github.com/asad/SMSD/blob/master/src/org/openscience/smsd/AtomAtomMapping.java>
Best wishes,
Asad
> On 2 Dec 2014, at 19:59, cdk-user-requ...@lists.sourceforge.net wrote:
>
> Send Cdk-user mailing list submissions to
> cdk-user@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/cdk-user
> or, via email, send a message with subject or body 'help' to
> cdk-user-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> cdk-user-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Cdk-user digest..."
>
>
> Today's Topics:
>
> 1. Re: how to print SMARTS pattern without hydrogens (John May)
> 2. Re: how to print SMARTS pattern without hydrogens
> (Nina Jeliazkova)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 2 Dec 2014 19:47:57 +0000
> From: John May <john.wilkinson...@gmail.com>
> Subject: Re: [Cdk-user] how to print SMARTS pattern without hydrogens
> To: Martin G?tlein <guetl...@posteo.de>
> Cc: cdkuser <cdk-user@lists.sourceforge.net>
> Message-ID: <c4b8de7d-936e-4f6b-bd32-ce117d4bc...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Just to clarify you can write SMILES in CDK you?re writing SMILES and then
> interpreting this as SMARTS. CDK doesn?t have the ability to write a SMARTS.
> As well as hydrogens you may also have trouble with aromaticity, charges, and
> isotopes.
>
>>> c(c[cH])c[cH]
>
> Is probably better as [#6]([#6][#6])[#6][#6].
>
> The reason you?re having trouble in CDK 1.5 is SMILES IO now correctly
> handles the valence.
>
> Anyways, There are a couple of solutions
>
> 1) reset the hydrogen counts to default (i.e atom typing) this will work for
> your examples but will also mean you would lose aromaticity flags (i.e. the
> example above isn?t a ring) and this wouldn?t fix nitrogens which also have H
> displayed when aromatic. I would not recommend this.
> 2) set all hydrogen counts to 0 (not null!) before generating the SMILES you
> may also want to do charge and mass. Simply loop over the MCS and set the
> implicitH count to 0. removeHydrogens has no effect because they?re not
> explicit -
> http://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/.
>
> 3) after parsing the SMILES as SMARTS, traverse the expression tree of each
> atom and replace the And(<OtherSmartsAtom>, HydrogenCount) with
> <OtherSmartsAtom>.
> 4) load the SMILES as a SMILES and do a normal subgraph match opposed to
> SMARTS.
>
> Also
> - make sure you use the new SMSD (not part of CDK) the CDK packages are
> quite old
> - avoid using the DefaultChemObjectBuilder and use
> SilentChemObjectBuilder (the naming is the wrong way round but actually
> Silent is better as it doesn?t fire off events).
> - you?re generating canonical SMILES when this isn?t needed use
> SmilesGenerator.generic().aromatic() when creating the SmilesGenerator.
>
> J
>
> On Dec 2, 2014, at 11:04 AM, Martin G?tlein <guetl...@posteo.de> wrote:
>
>> Hi,
>>
>> any help with this issue would be very much appreciated,
>>
>> Kind regards,
>> Martin
>>
>> -------- Originalnachricht --------
>> Betreff: Re: how to print SMARTS pattern without hydrogens
>> Datum: 02.12.2014 12:00
>> On 30 September 2014 at 09:30, Martin Guetlein
>> <martin.guetl...@googlemail.com> wrote:
>>> Hi,
>>>
>>> I am currently migrating from cdk1.4 to 1.5. I am mining the maximum
>>> common subgraph of two compounds and then print the resulting fragment
>>> as SMARTS. This is working in 1.4, however in 1.5 the SmilesGenerator
>>> is adding unwanted Hydrogens. How can I get rid of the Hydrogens?
>>> See example below.
>>> See also
>>> https://www.mail-archive.com/cdk-user@lists.sourceforge.net/msg02597.html
>>>
>>> Thanks and kind regards,
>>> Martin
>>>
>>> The following code prints "mcs: c(c[cH])c[cH]" instead of "mcs: ccccc"
>>> [[
>>> SmilesParser sp = new
>>> SmilesParser(DefaultChemObjectBuilder.getInstance());
>>> IAtomContainer mol1 = sp.parseSmiles("c1ccccc1NC");
>>> IAtomContainer mol2 = sp.parseSmiles("c1cccnc1");
>>> org.openscience.cdk.smsd.Isomorphism mcsFinder = new
>>> org.openscience.cdk.smsd.Isomorphism(
>>> org.openscience.cdk.smsd.interfaces.Algorithm.DEFAULT, true);
>>> mcsFinder.init(mol1, mol2, true, true);
>>> mcsFinder.setChemFilters(true, true, true);
>>>
>>> mol1 = mcsFinder.getReactantMolecule();
>>> IAtomContainer mcsmolecule =
>>> DefaultChemObjectBuilder.getInstance().newInstance(IAtomContainer.class,
>>> mol1);
>>> List<IAtom> atomsToBeRemoved = new ArrayList<IAtom>();
>>> for (IAtom atom : mcsmolecule.atoms())
>>> {
>>> int index = mcsmolecule.getAtomNumber(atom);
>>> if (!mcsFinder.getFirstMapping().containsKey(index))
>>> atomsToBeRemoved.add(atom);
>>> }
>>> for (IAtom atom : atomsToBeRemoved)
>>> mcsmolecule.removeAtomAndConnectedElectronContainers(atom);
>>>
>>> // has no effect
>>> // mcsmolecule = AtomContainerManipulator.removeHydrogens(mcsmolecule);
>>>
>>> SmilesGenerator g = new SmilesGenerator().aromatic();
>>> System.out.println("mcs: " + g.create(mcsmolecule));
>>> ]]
>>>
>>> --
>>> Dipl-Inf. Martin G?tlein
>>> Phone:
>>> +49 (0)761 203 8442 (office)
>>> +49 (0)177 623 9499 (mobile)
>>> Email:
>>> guetl...@informatik.uni-freiburg.de
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Tue, 2 Dec 2014 21:59:21 +0200
> From: Nina Jeliazkova <jeliazkova.n...@gmail.com>
> Subject: Re: [Cdk-user] how to print SMARTS pattern without hydrogens
> To: John May <john.wilkinson...@gmail.com>
> Cc: cdkuser <cdk-user@lists.sourceforge.net>
> Message-ID:
> <CAE5qDd1RY94oue1B6bW3cE6FZa8HpV0RqUP6jJDT3=gmtih...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Martin,
>
> Following John's comments, I've realized you might consider writing SMARTS
> via SmartsHelper.toSmarts() method from ambit2-smarts package (takes
> QueryAtomContainer )
>
> http://ambit.sourceforge.net/AMBIT2-LIBS/ambit2-smarts/apidocs/ambit2/smarts/SmartsHelper.html#toSmarts(org.openscience.cdk.isomorphism.matchers.QueryAtomContainer)
>
>
> Regards,
> Nina
>
> On 2 December 2014 at 21:47, John May <john.wilkinson...@gmail.com> wrote:
>
>> Just to clarify you can write SMILES in CDK you?re writing SMILES and then
>> interpreting this as SMARTS. CDK doesn?t have the ability to write a
>> SMARTS. As well as hydrogens you may also have trouble with aromaticity,
>> charges, and isotopes.
>>
>> c(c[cH])c[cH]
>>
>>
>> Is probably better as [#6]([#6][#6])[#6][#6].
>>
>> The reason you?re having trouble in CDK 1.5 is SMILES IO now correctly
>> handles the valence.
>>
>> Anyways, There are a couple of solutions
>>
>> 1) reset the hydrogen counts to default (i.e atom typing) this will work
>> for your examples but will also mean you would lose aromaticity flags (i.e.
>> the example above isn?t a ring) and this wouldn?t fix nitrogens which also
>> have H displayed when aromatic. I would not recommend this.
>> 2) set all hydrogen counts to 0 (not null!) before generating the SMILES
>> you may also want to do charge and mass. Simply loop over the MCS and set
>> the implicitH count to 0. removeHydrogens has no effect because they?re not
>> explicit -
>> http://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/
>> .
>> 3) after parsing the SMILES as SMARTS, traverse the expression tree of
>> each atom and replace the And(<OtherSmartsAtom>, HydrogenCount) with
>> <OtherSmartsAtom>.
>> 4) load the SMILES as a SMILES and do a normal subgraph match opposed to
>> SMARTS.
>>
>> Also
>> - make sure you use the new SMSD (not part of CDK) the CDK packages are
>> quite old
>> - avoid using the DefaultChemObjectBuilder and use SilentChemObjectBuilder
>> (the naming is the wrong way round but actually Silent is better as it
>> doesn?t fire off events).
>> - you?re generating canonical SMILES when this isn?t needed use
>> SmilesGenerator.generic().aromatic() when creating the SmilesGenerator.
>>
>> J
>>
>
>
>
>
>
>
>> On Dec 2, 2014, at 11:04 AM, Martin G?tlein <guetl...@posteo.de> wrote:
>>
>> Hi,
>>
>> any help with this issue would be very much appreciated,
>>
>> Kind regards,
>> Martin
>>
>> -------- Originalnachricht --------
>> Betreff: Re: how to print SMARTS pattern without hydrogens
>> Datum: 02.12.2014 12:00
>> On 30 September 2014 at 09:30, Martin Guetlein
>> <martin.guetl...@googlemail.com> wrote:
>>
>> Hi,
>>
>> I am currently migrating from cdk1.4 to 1.5. I am mining the maximum
>> common subgraph of two compounds and then print the resulting fragment
>> as SMARTS. This is working in 1.4, however in 1.5 the SmilesGenerator
>> is adding unwanted Hydrogens. How can I get rid of the Hydrogens?
>> See example below.
>> See also
>> https://www.mail-archive.com/cdk-user@lists.sourceforge.net/msg02597.html
>>
>> Thanks and kind regards,
>> Martin
>>
>> The following code prints "mcs: c(c[cH])c[cH]" instead of "mcs: ccccc"
>> [[
>> SmilesParser sp = new
>> SmilesParser(DefaultChemObjectBuilder.getInstance());
>> IAtomContainer mol1 = sp.parseSmiles("c1ccccc1NC");
>> IAtomContainer mol2 = sp.parseSmiles("c1cccnc1");
>> org.openscience.cdk.smsd.Isomorphism mcsFinder = new
>> org.openscience.cdk.smsd.Isomorphism(
>> org.openscience.cdk.smsd.interfaces.Algorithm.DEFAULT, true);
>> mcsFinder.init(mol1, mol2, true, true);
>> mcsFinder.setChemFilters(true, true, true);
>>
>> mol1 = mcsFinder.getReactantMolecule();
>> IAtomContainer mcsmolecule =
>> DefaultChemObjectBuilder.getInstance().newInstance(IAtomContainer.class,
>> mol1);
>> List<IAtom> atomsToBeRemoved = new ArrayList<IAtom>();
>> for (IAtom atom : mcsmolecule.atoms())
>> {
>> int index = mcsmolecule.getAtomNumber(atom);
>> if (!mcsFinder.getFirstMapping().containsKey(index))
>> atomsToBeRemoved.add(atom);
>> }
>> for (IAtom atom : atomsToBeRemoved)
>> mcsmolecule.removeAtomAndConnectedElectronContainers(atom);
>>
>> // has no effect
>> // mcsmolecule = AtomContainerManipulator.removeHydrogens(mcsmolecule);
>>
>> SmilesGenerator g = new SmilesGenerator().aromatic();
>> System.out.println("mcs: " + g.create(mcsmolecule));
>> ]]
>>
>> --
>> Dipl-Inf. Martin G?tlein
>> Phone:
>> +49 (0)761 203 8442 (office)
>> +49 (0)177 623 9499 (mobile)
>> Email:
>> guetl...@informatik.uni-freiburg.de
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
> ------------------------------
>
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
> End of Cdk-user Digest, Vol 101, Issue 2
> ****************************************
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user