Hi John, Thanks so much for your quick reply. I am wondering if you can tell me which fingerprints generate unfolded features and if there is a table of the subgraphs that represents these features for these fingerprints? Or if there is a way to generate the subgraphs for these features from CDK?
Best, Allen From: John Mayfield<mailto:john.wilkinson...@gmail.com> Sent: Tuesday, 29 August 2023 3:05 am To: Chong Kim San Allen<mailto:kimsanallen.ch...@ntu.edu.sg>; cdkuser<mailto:cdk-user@lists.sourceforge.net> Subject: Re: [Cdk-user] Extended Fingerprint: what do the features represent? [Alert: Non-NTU Email] Be cautious before clicking any link or attachment. The features represent subgraphs of the input molecule. For a binary fingerprint they are hashed (ireversally) and there is a many-to-one mapping between the subgraph and the hash (i.e. the value 14 you see). We do not currently provide a general way to see what features hash to which values but some fingerprints have an option to generate the features "unfolded". We should add this option in more places since it can be useful. Best Wishes, John On Mon, 28 Aug 2023 at 19:06, Chong Kim San Allen via Cdk-user <cdk-user@lists.sourceforge.net<mailto:cdk-user@lists.sourceforge.net>> wrote: Dear Helpdesk, I have used CDK to generate the Extended Fingerprints for a couple of compounds and I found that certain features are common among my compounds. For example, “14” keeps showing up. I would like to know what is “14”? I know that the default path length is 7 so I was wondering if the feature is a chemical substructure? The default size for Extended Fingerprint is 1024 so I was wondering if there is a way to figure out what each of the 1024 features represents. Similarly, if I generated ECFP6 which has 2^32 features (count version), is there a way for me to figure out what each of those features are? If a feature appears to have a high count and I wanted to figure out what this feature was, is there a command I can use to find out what that feature represents? Thanks in advance for your help. Best, Allen CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdk-user
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user