Hi John,

Thanks so much for your quick reply. I am wondering if you can tell me which 
fingerprints generate unfolded features and if there is a table of the 
subgraphs that represents these features for these fingerprints? Or if there is 
a way to generate the subgraphs for these features from CDK?

Best,
Allen

From: John Mayfield<mailto:john.wilkinson...@gmail.com>
Sent: Tuesday, 29 August 2023 3:05 am
To: Chong Kim San Allen<mailto:kimsanallen.ch...@ntu.edu.sg>; 
cdkuser<mailto:cdk-user@lists.sourceforge.net>
Subject: Re: [Cdk-user] Extended Fingerprint: what do the features represent?


[Alert: Non-NTU Email] Be cautious before clicking any link or attachment.
The features represent subgraphs of the input molecule. For a binary 
fingerprint they are hashed (ireversally) and there is a many-to-one mapping 
between the subgraph and the hash (i.e. the value 14 you see). We do not 
currently provide a general way to see what features hash to which values but 
some fingerprints have an option to generate the features "unfolded". We should 
add this option in more places since it can be useful.

Best Wishes,
John

On Mon, 28 Aug 2023 at 19:06, Chong Kim San Allen via Cdk-user 
<cdk-user@lists.sourceforge.net<mailto:cdk-user@lists.sourceforge.net>> wrote:
Dear Helpdesk,

I have used CDK to generate the Extended Fingerprints for a couple of compounds 
and I found that certain features are common among my compounds. For example, 
“14” keeps showing up. I would like to know what is “14”? I know that the 
default path length is 7 so I was wondering if the feature is a chemical 
substructure? The default size for Extended Fingerprint is 1024 so I was 
wondering if there is a way to figure out what each of the 1024 features 
represents.

Similarly, if I generated ECFP6 which has 2^32 features (count version), is 
there a way for me to figure out what each of those features are? If a feature 
appears to have a high count and I wanted to figure out what this feature was, 
is there a command I can use to find out what that feature represents?

Thanks in advance for your help.

Best,
Allen



CONFIDENTIALITY: This email is intended solely for the person(s) named and may 
be confidential and/or privileged. If you are not the intended recipient, 
please delete it, notify us and do not copy, use, or disclose its contents.
Towards a sustainable earth: Print only when necessary. Thank you.
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/cdk-user

_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to