Hi Martin,

By design CDK separates out standardisation (H rep, tautomers, protonation)
from canonicalisation (ordering). You've found the method to "sprout"
hydrogens but you actually want the opposite - suppressHydrogens(mol)
<http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/tools/manipulator/AtomContainerManipulator.html#suppressHydrogens(org.openscience.cdk.interfaces.IAtomContainer)>
.

I think you might be conflating two things, although possible you should
perhaps store and index (unique check) you structures separately. For your
example you could assign a unique tautomer but you'll be back at square one
with your first example.

O=C[CH]1Cc2[nH]cnc2CC1 -> 1,3 proton shift -> C(=O)[CH]1Cc2c(CC1)[nH]cn2
CC(=O)N -> 1,3 proton shift -> CC(O)=N

Thanks,


Regards,
John W May
john.wilkinson...@gmail.com

On 3 September 2015 at 09:27, Martin Gütlein <guetl...@uni-mainz.de> wrote:

> One more thing:
> I noticed that unique SMILES differentiate explicit and implicit
> Hydrogens, e.g. "[H]Cl" is different form "Cl". This can be solved by
> running AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol).
> However, I do not like having my all Hs defined explicitly. Is there an
> option in the CDK to convert explict Hs back to implicit, leaving only
> thoses Hs as explict that are relevant?
>
> Martin
>
>
> Am 03.09.2015 um 09:48 schrieb Martin Gütlein:
>
> Hi John,
>
> thanks for your reply, I tried to use unique (kekulized) SMILES instead of
> InChIs.
> Whats good is that the structure for (most) compounds is stored correctly
> (i.e., I can create an IAtomContainer that is apparently equal).
>
> However, I found an example were the unique SMILES of two identical
> structures is different (see below).
>
> Kind regards,
> Martin
>
>
>     for (String smi : new String[] { "O=C[CH]1Cc2[nH]cnc2CC1",
> "C(=O)[CH]1Cc2c(CC1)[nH]cn2" })
>         {
>             IAtomContainer mol = new
> SmilesParser(SilentChemObjectBuilder.getInstance()).parseSmiles(smi);
>             System.out.println(SmilesGenerator.unique().create(mol));
>         }
>
>
>
>
>
>
> Am 02.09.2015 um 20:58 schrieb John M:
>
> Just to add on - if you really want to use InChI (don't) then you could
> store the AuxInfo but the CDK doesn't have a conversion method that accepts
> it when turning it back into an AtomContainer.
>
> I also notice you're using unique SMILES (default by old APIs), you
> probably want isomeric that a non-canonical but store stereochemistry.
>
> IAtomContainer mol = SmilesGenerator.isomeric().create(container);
>
> John
>
> Regards,
> John W May
> john.wilkinson...@gmail.com
>
> On 2 September 2015 at 19:54, John M <john.wilkinson...@gmail.com> wrote:
>
>> Hi Martin,
>>
>> The InChI is an identifier and not a structure representation it should
>> never be used as such. For maximum preservation you should store compounds
>> as Kekulé SMILES or Molfile. You can store additional data such as
>> coordinates supplementary to the SMILES.
>>
>> You might find a recent presentation by Noel (O Babel) and Rajarshi (CDK)
>> useful:
>>
>> http://baoilleach.blogspot.co.uk/2015/08/the-whole-of-cheminformatics-best.html
>>
>> John
>>
>>
>
>
> --
> Dr. Martin Gütlein
> Phone:+49 (0)6131 39 23336 (office)+49 (0)177 623 9499 (mobile)
> Email:guetl...@uni-mainz.de
>
>
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to