Hi Andres, Excellent analysis - thank you. Good to see that the recent Phenol change should bring things more in agreement.
John On Wed, 24 Nov 2021 at 21:29, Andres Fernando Bernal Escobar < andresf.bern...@utadeo.edu.co> wrote: > Hello John, thanks for your answer. I ran a quick comparison between CDK > and PubChem, with a few hand-picked molecules. These are the results: > https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing > > I split the molecules in four subsets. The first comprises seemingly > non-problematic molecules: carboxylic acids, amines, aliphatic esters, > aliphatic ethers. In these cases CDK, PubChem and my own intuition are all > in agreement. > > The second subset comprises molecules where I think CDK is wrong and > PubChem is correct: phenols. This is due to the issue that you corrected in > the branch you linked. > > The third subset comprises molecules where I think CDK is correct and > PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of > aromatic ethers, we know CDK explicitly introduces a correction to exclude > aromatic ether oxygens from the HB acceptors count. I am not a specialist, > but I understand there are sound reasons to make this exception. PubChem > doesn't seem to implement it. In the case of amides and nitro compounds I > don't quite understand what is going on with PubChem, but CDK's answer > seems the correct one to me. > > The last subset comprises aromatic esters (acyloxy substituents). I > honestly don't know what is correct in this case. Are oxygen atoms from > aromatic esters also an exception, just as those from aromatic ethers? That > would mean CDK is right. Otherwise, another correction is needed to make > sure CDK excludes no oxygens on aromatic rings other than those of ethers. > > El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield ( > john.wilkinson...@gmail.com) escribió: > >> Thanks for your email. I've always thought the CDK HBond acceptor/donor >> code is a little wonky and needs investigating. I don't have time to look >> deeply at it but yes my reading of this is it doesn't check for the ether >> oxygen correctly. If someone was inclined checking CDK's (and RDKit's) >> values with PubChem would be a quick project that may provide some insight >> onto missed cases and disagreements. >> >> I've made a change here to get the correct value for phenol: >> https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1 >> >> On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo < >> guillermo.restr...@mis.mpg.de> wrote: >> >>> We are working with some descriptors taken from Reaxys database, which >>> according to its owner are computed using your CDK library. We found >>> something unexpected and would very much appreciate it if you could help >>> us to understand. >>> >>> We noted that some phenols are reported as having 0 hydrogen bond >>> acceptors, whereas we expected them to have at least one. We checked CDK >>> source code and found this comment on HBondAcceptorCountDescriptor.java: >>> >>> The following groups are counted as hydrogen bond acceptors: >>> - any oxygen where the formal charge of the oxygen is non-positive (i.e. >>> formal charge <= 0) except >>> - an aromatic ether oxygen (i.e. an ether oxygen that is adjacent >>> to at least one aromatic carbon) >>> - an oxygen that is adjacent to a nitrogen >>> - any nitrogen where the formal charge of the nitrogen is non-positive >>> (i.e. formal charge <= 0) except >>> - a nitrogen that is adjacent to an oxygen >>> >>> The way we understood it, this means that phenols should have at least >>> one hydrogen bond acceptor. But further down in the same file, these >>> lines seem to specify otherwise: >>> >>> // looking for suitable oxygen atoms >>> else if (atom.getAtomicNumber() == IElement.O && >>> atom.getFormalCharge() <= 0) { >>> //excluding oxygens that are adjacent to a nitrogen or >>> to an aromatic carbon >>> List<IBond> neighbours = ac.getConnectedBondsList(atom); >>> for (IBond bond : neighbours) { >>> IAtom neighbor = bond.getOther(atom); >>> if (neighbor.getAtomicNumber() == IElement.N || >>> (neighbor.getAtomicNumber() == IElement.C && >>> neighbor.isAromatic() && >>> bond.getOrder() != IBond.Order.DOUBLE)) >>> continue atomloop;; >>> } >>> hBondAcceptors++; >>> } >>> >>> Is this intended, or is it a bug, or are we misunderstanding something? >>> >>> >>> >>> _______________________________________________ >>> Cdk-user mailing list >>> Cdk-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>> >> _______________________________________________ >> Cdk-user mailing list >> Cdk-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> > > > -- > > > *Andrés Bernal* > *Área de Ciencias Básicas y Modelado* > *Profesor Asociado* > Ext. 1705 > andresf.bern...@utadeo.edu.co > Dirección Utadeo: Carrera 4 # 22-61 > > > > *ADVERTENCIA SOBRE CONFIDENCIALIDAD* > > Las opiniones expresadas en el presente mensaje no representan > necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo > lozano. La información contenida en este correo electrónico, incluyendo sus > anexos, está dirigida exclusivamente a su destinatario y puede contener > datos de carácter confidencial protegidos por la ley. Si usted no es el > destinatario de este mensaje por favor infórmenos y elimínelo a la mayor > brevedad. Cualquier retención, difusión, distribución, divulgación o copia > de éste mensaje es prohibida y será sancionada por la ley. > > Este mensaje ha sido sometido a programas antivirus. No obstante, La > Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad > por eventuales daños generados por el recibo y uso de este material, siendo > responsabilidad del destinatario verificar con sus propios medios de la > existencia de virus u otros defectos. > > *WARNING ABOUT CONFIDENTIAL INFORMATION* > > The opinions expressed herein do not necessarily reflect the positions of > the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in > this electronic mail and attachments is confidential and intended only for > the use of the individual or entity to whom it is addressed and may have > confidential data. If you are not the intended recipient, you are hereby > notified that any disclosure, copying, distribution, or any other use of > the information is strictly prohibited and has legal repercussions. > Therefore, if you have received this document by mistake, please notify the > sender immediately and destroy this document and attachments without making > any copy of any kind. > > This message has been tested by antivirus software. Nonetheless, the > Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any > damages or loss of any kind that might arise from the use of, misuse of, or > the inability to use the materials contained on this electronic message. It > is the responsibility of the recipient to verify by his own means the > presence of a virus or any other harmful components, defects or errors. > > *ADVERTENCIA SOBRE CONFIDENCIALIDAD* > > Las opiniones expresadas en el presente mensaje no representan > necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo > lozano. La información contenida en este correo electrónico, incluyendo sus > anexos, está dirigida exclusivamente a su destinatario y puede contener > datos de carácter confidencial protegidos por la ley. Si usted no es el > destinatario de este mensaje por favor infórmenos y elimínelo a la mayor > brevedad. Cualquier retención, difusión, distribución, divulgación o copia > de éste mensaje es prohibida y será sancionada por la ley. > > Este mensaje ha sido sometido a programas antivirus. No obstante, La > Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad > por eventuales daños generados por el recibo y uso de este material, siendo > responsabilidad del destinatario verificar con sus propios medios de la > existencia de virus u otros defectos. > > *WARNING ABOUT CONFIDENTIAL INFORMATION* > > The opinions expressed herein do not necessarily reflect the positions of > the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in > this electronic mail and attachments is confidential and intended only for > the use of the individual or entity to whom it is addressed and may have > confidential data. If you are not the intended recipient, you are hereby > notified that any disclosure, copying, distribution, or any other use of > the information is strictly prohibited and has legal repercussions. > Therefore, if you have received this document by mistake, please notify the > sender immediately and destroy this document and attachments without making > any copy of any kind. > This message has been tested by antivirus software. Nonetheless, the > Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any > damages or loss of any kind that might arise from the use of, misuse of, or > the inability to use the materials contained on this electronic message. It > is the responsibility of the recipient to verify by his own means the > presence of a virus or any other harmful components, defects or errors.
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user