In fact, if I convert a Structure to the aromatic form with ChemAxon (see
below),
I get the bond type 4 for the aromatic bonds. When I use your codes, either
of the aforementioned methods complain about Implicity hydrogens not being
set(See below).
You mentioned I should use General aromaticity models instead of basic.
What do you mean here? Sorry I am new to this aspect of CDK.
Thanks,
ptrn1.matches(mol) returns
Exception in thread "main" java.lang.NullPointerException: Implicit
hydrogen count was not set.
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at
org.openscience.cdk.isomorphism.matchers.smarts.SMARTSAtomInvariants.configureDaylight(
SMARTSAtomInvariants.java:292)
at
org.openscience.cdk.isomorphism.matchers.smarts.SMARTSAtomInvariants.configureDaylightWithRingInfo(
SMARTSAtomInvariants.java:247)
at org.openscience.cdk.isomorphism.matchers.smarts.SmartsMatchers.prepare(
SmartsMatchers.java:51)
at org.openscience.cdk.smiles.smarts.SmartsPattern.matchAll(
SmartsPattern.java:150)
at org.openscience.cdk.smiles.smarts.SmartsPattern.match(
SmartsPattern.java:116)
at org.openscience.cdk.isomorphism.Pattern.matches(Pattern.java:75)
test
Mrv15c1405191613332D
17 19 0 0 0 0 999 V2000
-0.3869 0.2595 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.1014 -0.1530 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.1014 -0.9780 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3869 -1.3905 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3276 -0.9780 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3276 -0.1530 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0420 -1.3905 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.7565 -0.9780 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.7565 -0.1530 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0420 0.2595 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.4710 0.2595 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.1855 -0.1530 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.9000 0.2595 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.9000 1.0845 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.1855 1.4970 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4710 1.0845 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0420 -2.2155 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 4 0 0 0 0
2 3 4 0 0 0 0
3 4 4 0 0 0 0
4 5 4 0 0 0 0
5 6 4 0 0 0 0
1 6 4 0 0 0 0
7 8 1 0 0 0 0
8 9 1 0 0 0 0
9 10 1 0 0 0 0
5 7 1 0 0 0 0
6 10 1 0 0 0 0
12 13 4 0 0 0 0
13 14 4 0 0 0 0
14 15 4 0 0 0 0
15 16 4 0 0 0 0
11 12 4 0 0 0 0
11 16 4 0 0 0 0
9 11 1 0 0 0 0
7 17 2 0 0 0 0
M END
$$$$
On Thu, May 19, 2016 at 1:13 PM, John M <john.wilkinson...@gmail.com> wrote:
> Yep, SmartsPattern delegates to VF below but does all required setup such
> as arom perception and invariants required for SMARTS matching:
>
> https://github.com/cdk/cdk/blob/master/tool/smarts/src/main/java/org/openscience/cdk/smiles/smarts/SmartsPattern.java#L142-L172
>
> If you generated them with ChemAxon you should use the 'General'
> aromaticity model rather than 'Basic'.
>
> John
>
>
>
> Regards,
> John W May
> john.wilkinson...@gmail.com
>
> On 19 May 2016 at 19:41, Yannick .Djoumbou <y.djoum...@gmail.com> wrote:
>
>> Hi John,
>>
>> Thanks a lot for the quick answer. I will be switching to the new tools.
>> I tried both the SMARTSpattern and the VentoFoggia. Both are working for
>> me. If I understood correctly, is the VentoFogia more suitable if I want to
>> run substructure matching on a large scale?
>> I also realized that some of my SMARTS patterns should be modified a bit.
>> It is cumbersome when they are generated with one tool and tested/used with
>> another.
>>
>> Thanks again.
>>
>> Best,
>>
>> Yannick
>>
>> On Thu, May 19, 2016 at 1:51 AM, John M <john.wilkinson...@gmail.com>
>> wrote:
>>
>>> Hi Yannick,
>>>
>>> This should be much similar now. First off, you're using some old APIs,
>>> SQT still works but it's preferred now to go through 'Pattern'. The
>>> SmartsPattern does all the setup needed, other implementations can be
>>> faster and more customisable (see later) if you have many SMARTS against
>>> one molecule but only recommended if needed.
>>> The SMSD classes are some specific to SMSD so unless you need MCS don't
>>> use them.
>>>
>>> I've attached the code below but if think the real problem here is the
>>> really SMARTS don't match molecule using Daylight's aromaticity model. All
>>> ring atoms there are aromatic and an explicit '=' in SMARTS doesn't match
>>> an aromatic atom ('=,:' is the way to do that).
>>>
>>> You can try out SMARTS on CDKDepict:
>>> http://cdkdepict-openchem.rhcloud.com/depict.html
>>>
>>>> COC1=C(O)C=C2OC=C(C(=O)C2=C1)C3=CC=C(O)C=C3
>>>>
>>> [O;X1]=[#6;R1]-,:1-,:[#6;R1](=,:[#6;R1]-,:[#8]-,:c2ccccc-,:12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1
>>>> Correct SMARTS
>>>
>>>
>>> Doubly confirmed with OpenBabel
>>>
>>>>
>>>> *[sovereign ~/Downloads]: obgrep
>>>> '[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1'
>>>> glycitein.sdf [sovereign ~/Downloads]: obgrep
>>>> '[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1'
>>>> glycitein.sdf *
>>>
>>>
>>> Here would be the normal code if SMARTS were changed. SmartsPattern does
>>> aromaticity automatically.
>>>
>>> *IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();*
>>>>
>>>> *Pattern ptrn1 =
>>>>> SmartsPattern.create("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1",
>>>>> null);*
>>>>
>>>> *Pattern ptrn2 =
>>>>> SmartsPattern.create("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1",
>>>>> null);*
>>>>
>>>>
>>>>> *try (MDLV2000Reader mrdr = new MDLV2000Reader(new
>>>>> FileReader("/Users/john/Downloads/glycitein.sdf"))) {*
>>>>
>>>> * IAtomContainer mol;*
>>>>
>>>> * while ((mol = mrdr.read(bldr.newInstance(IAtomContainer.class, 0, 0,
>>>>> 0, 0))) != null) {*
>>>>
>>>> * System.err.println("p1: " + ptrn1.matches(mol));*
>>>>
>>>> * System.err.println("p2: " + ptrn2.matches(mol));*
>>>>
>>>> * }*
>>>>
>>>> *}*
>>>>
>>>>
>>> Here's the code where we use a different aromaticity model. This is
>>> lower level hence some more setup is needed.
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *IChemObjectBuilder bldr =
>>>> SilentChemObjectBuilder.getInstance();Pattern ptrn1 =
>>>> VentoFoggia.findSubstructure(SMARTSParser.parse("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1",
>>>> null));Pattern ptrn2 =
>>>> VentoFoggia.findSubstructure(SMARTSParser.parse("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1",
>>>> null));Aromaticity arom = new Aromaticity(ElectronDonation.piBonds(),
>>>> Cycles.all(6));try (MDLV2000Reader mrdr = new
>>>> MDLV2000Reader(new FileReader("/Users/john/Downloads/glycitein.sdf"))) {
>>>> IAtomContainer mol; while ((mol =
>>>> mrdr.read(bldr.newInstance(IAtomContainer.class, 0, 0, 0, 0))) != null) {
>>>> arom.apply(mol); SmartsMatchers.prepare(mol, true);
>>>> System.err.println("p1: " + ptrn1.matches(mol));
>>>> System.err.println("p2: " + ptrn2.matches(mol)); }}*
>>>
>>>
>>>
>>> Regards,
>>> John W May
>>> john.wilkinson...@gmail.com
>>>
>>> On 19 May 2016 at 06:55, Yannick .Djoumbou <y.djoum...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am having some issues with the CDK library.
>>>>
>>>> I have the molecule "glycitein" in the attached file (glycitein.sdf).
>>>> I am running the SMARTSQueryTool to perform structure search. The
>>>> SMARTS patterns are the following:
>>>>
>>>>
>>>> P1:
>>>> [O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1
>>>>
>>>>
>>>> P2:
>>>> [O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1
>>>>
>>>> For each of those, the query tool returns false, which is
>>>> really surprising. I imagine it still has to do with the Aromaticity
>>>> detection or a related issue. I have tried many things and it seems that
>>>> they do not always work as they should.
>>>>
>>>> 1) I therefore preprocessed the molecule using the code below (from a
>>>> previous chat I had on a forum):
>>>>
>>>> SMSDNormalizer.percieveAtomTypesAndConfigureAtoms(molecule);
>>>>
>>>> CDKHydrogenAdder.getInstance(molecule.getBuilder())
>>>>
>>>> .addImplicitHydrogens(molecule);
>>>>
>>>> for (IBond bond : molecule.bonds()) {
>>>>
>>>> if (bond.getFlag(CDKConstants.SINGLE_OR_DOUBLE)) {
>>>>
>>>> bond.setFlag(CDKConstants.ISAROMATIC, true);
>>>>
>>>> bond.getAtom(0).setFlag(CDKConstants.ISAROMATIC, true);
>>>>
>>>> bond.getAtom(1).setFlag(CDKConstants.ISAROMATIC, true);
>>>>
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>> SMSDNormalizer.aromatizeMolecule(molecule);
>>>>
>>>>
>>>> I attached the resulting structure in SDF format as returned by CDK
>>>> ((glycitein_processed.sdf)), which in most editors is shown as in the
>>>> attached picture. It seems that all the aromatic bonds (marked as 4) in the
>>>> SDF are perceived as single bonds.
>>>>
>>>> Therefore, the result of the structure search is still "FALSE".
>>>>
>>>> By the way, trying a combination of AtomContainerManipulator (to
>>>> perceive atom types) and Aromaticity
>>>> <http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/aromaticity/Aromaticity.html>
>>>> did not help either
>>>>
>>>>
>>>>
>>>> 2) Instead of aromatizing, I removed the SMSDNormalizer lines, and
>>>> added the following:
>>>>
>>>> AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(
>>>> molecule);
>>>>
>>>> Kekulization.kekulize(molecule);
>>>>
>>>>
>>>> The SDF of the resulting molecule is the same. The result also.
>>>>
>>>>
>>>> How can I process these molecules efficiently?
>>>>
>>>>
>>>> I am writing a function that will take SDF files, and run
>>>> the SMARTSQueryTool to match certain patterns. Therefore, I need an
>>>> efficient way to preprocess these molecules.
>>>>
>>>>
>>>> Can someone help me out here?
>>>>
>>>>
>>>> Thank you in advance.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Mobile security can be enabling, not merely restricting. Employees who
>>>> bring their own devices (BYOD) to work are irked by the imposition of
>>>> MDM
>>>> restrictions. Mobile Device Manager Plus allows you to control only the
>>>> apps on BYO-devices by containerizing them, leaving personal data
>>>> untouched!
>>>> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>>>> _______________________________________________
>>>> Cdk-user mailing list
>>>> Cdk-user@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user