Hi Andres,

Excellent analysis - thank you. Good to see that the recent Phenol change
should bring things more in agreement.

John

On Wed, 24 Nov 2021 at 21:29, Andres Fernando Bernal Escobar <
andresf.bern...@utadeo.edu.co> wrote:

> Hello John, thanks for your answer. I ran a quick comparison between CDK
> and PubChem, with a few hand-picked molecules. These are the results:
> https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing
>
> I split the molecules in four subsets. The first comprises seemingly
> non-problematic molecules: carboxylic acids, amines, aliphatic esters,
> aliphatic ethers. In these cases CDK, PubChem and my own intuition are all
> in agreement.
>
> The second subset comprises molecules where I think CDK is wrong and
> PubChem is correct: phenols. This is due to the issue that you corrected in
> the branch you linked.
>
> The third subset comprises molecules where I think CDK is correct and
> PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of
> aromatic ethers, we know CDK explicitly introduces a correction to exclude
> aromatic ether oxygens from the HB acceptors count. I am not a specialist,
> but I understand there are sound reasons to make this exception. PubChem
> doesn't seem to implement it. In the case of amides and nitro compounds I
> don't quite understand what is going on with PubChem, but CDK's answer
> seems the correct one to me.
>
> The last subset comprises aromatic esters (acyloxy substituents). I
> honestly don't know what is correct in this case. Are oxygen atoms from
> aromatic esters also an exception, just as those from aromatic ethers? That
> would mean CDK is right. Otherwise, another correction is needed to make
> sure CDK excludes no oxygens on aromatic rings other than those of ethers.
>
> El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield (
> john.wilkinson...@gmail.com) escribió:
>
>> Thanks for your email. I've always thought the CDK HBond acceptor/donor
>> code is a little wonky and needs investigating. I don't have time to look
>> deeply at it but yes my reading of this is it doesn't check for the ether
>> oxygen correctly. If someone was inclined checking CDK's (and RDKit's)
>> values with PubChem would be a quick project that may provide some insight
>> onto missed cases and disagreements.
>>
>> I've made a change here to get the correct value for phenol:
>> https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1
>>
>> On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo <
>> guillermo.restr...@mis.mpg.de> wrote:
>>
>>> We are working with some descriptors taken from Reaxys database, which
>>> according to its owner are computed using your CDK library. We found
>>> something unexpected and would very much appreciate it if you could help
>>> us to understand.
>>>
>>> We noted that some phenols are reported as having 0 hydrogen bond
>>> acceptors, whereas we expected them to have at least one. We checked CDK
>>> source code and found this comment on HBondAcceptorCountDescriptor.java:
>>>
>>> The following groups are counted as hydrogen bond acceptors:
>>> - any oxygen where the formal charge of the oxygen is non-positive (i.e.
>>> formal charge <= 0) except
>>>        - an aromatic ether oxygen (i.e. an ether oxygen that is adjacent
>>> to at least one aromatic carbon)
>>>         - an oxygen that is adjacent to a nitrogen
>>> - any nitrogen where the formal charge of the nitrogen is non-positive
>>> (i.e. formal charge <= 0) except
>>>         - a nitrogen that is adjacent to an oxygen
>>>
>>> The way we understood it, this means that phenols should have at least
>>> one hydrogen bond acceptor. But further down in the same file, these
>>> lines seem to specify otherwise:
>>>
>>> // looking for suitable oxygen atoms
>>>              else if (atom.getAtomicNumber() == IElement.O &&
>>> atom.getFormalCharge() <= 0) {
>>>                  //excluding oxygens that are adjacent to a nitrogen or
>>> to an aromatic carbon
>>>                  List<IBond> neighbours = ac.getConnectedBondsList(atom);
>>>                  for (IBond bond : neighbours) {
>>>                      IAtom neighbor = bond.getOther(atom);
>>>                      if (neighbor.getAtomicNumber() == IElement.N ||
>>>                          (neighbor.getAtomicNumber() == IElement.C &&
>>>                           neighbor.isAromatic() &&
>>>                           bond.getOrder() != IBond.Order.DOUBLE))
>>>                          continue atomloop;;
>>>                  }
>>>                  hBondAcceptors++;
>>>              }
>>>
>>> Is this intended, or is it a bug, or are we misunderstanding something?
>>>
>>>
>>>
>>> _______________________________________________
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
>
> --
>
>
> *Andrés Bernal*
> *Área de Ciencias Básicas y Modelado*
> *Profesor Asociado*
> Ext. 1705
> andresf.bern...@utadeo.edu.co
> Dirección Utadeo: Carrera 4 # 22-61
>
>
>
> *ADVERTENCIA SOBRE CONFIDENCIALIDAD*
>
> Las opiniones expresadas en el presente mensaje no representan
> necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo
> lozano. La información contenida en este correo electrónico, incluyendo sus
> anexos, está dirigida exclusivamente a su destinatario y puede contener
> datos de carácter confidencial protegidos por la ley. Si usted no es el
> destinatario de este mensaje por favor infórmenos y elimínelo a la mayor
> brevedad. Cualquier retención, difusión, distribución, divulgación o copia
> de éste mensaje es prohibida y será sancionada por la ley.
>
> Este mensaje ha sido sometido a programas antivirus. No obstante, La
> Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad
> por eventuales daños generados por el recibo y uso de este material, siendo
> responsabilidad del destinatario verificar con sus propios medios de la
> existencia de virus u otros defectos.
>
>  *WARNING ABOUT CONFIDENTIAL INFORMATION*
>
> The opinions expressed herein do not necessarily reflect the positions of
> the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in
> this electronic mail and attachments is confidential and intended only for
> the use of the individual or entity to whom it is addressed and may have
> confidential data. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, or any other use of
> the information is strictly prohibited and has legal repercussions.
> Therefore, if you have received this document by mistake, please notify the
> sender immediately and destroy this document and attachments without making
> any copy of any kind.
>
> This message has been tested by antivirus software. Nonetheless, the
> Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any
> damages or loss of any kind that might arise from the use of, misuse of, or
> the inability to use the materials contained on this electronic message. It
> is the responsibility of the recipient to verify by his own means the
> presence of a virus or any other harmful components, defects or errors.
>
> *ADVERTENCIA SOBRE CONFIDENCIALIDAD*
>
> Las opiniones expresadas en el presente mensaje no representan
> necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo
> lozano. La información contenida en este correo electrónico, incluyendo sus
> anexos, está dirigida exclusivamente a su destinatario y puede contener
> datos de carácter confidencial protegidos por la ley. Si usted no es el
> destinatario de este mensaje por favor infórmenos y elimínelo a la mayor
> brevedad. Cualquier retención, difusión, distribución, divulgación o copia
> de éste mensaje es prohibida y será sancionada por la ley.
>
> Este mensaje ha sido sometido a programas antivirus. No obstante, La
> Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad
> por eventuales daños generados por el recibo y uso de este material, siendo
> responsabilidad del destinatario verificar con sus propios medios de la
> existencia de virus u otros defectos.
>
>  *WARNING ABOUT CONFIDENTIAL INFORMATION*
>
> The opinions expressed herein do not necessarily reflect the positions of
> the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in
> this electronic mail and attachments is confidential and intended only for
> the use of the individual or entity to whom it is addressed and may have
> confidential data. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, or any other use of
> the information is strictly prohibited and has legal repercussions.
> Therefore, if you have received this document by mistake, please notify the
> sender immediately and destroy this document and attachments without making
> any copy of any kind.
> This message has been tested by antivirus software. Nonetheless, the
> Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any
> damages or loss of any kind that might arise from the use of, misuse of, or
> the inability to use the materials contained on this electronic message. It
> is the responsibility of the recipient to verify by his own means the
> presence of a virus or any other harmful components, defects or errors.
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to