Hi Alexis,

Knowing what you want to achieve, I would take the problem the other way 
around. Instead of matching your many fragments to your input structure, I 
would rather apply the same transformation(s) you apply to your fragments to 
your input structure.
I know that you replace all non-hydrogen atoms by "any" atoms, and all 
single/double/triple bonds by "any" bonds; you could store a list of fragments 
where all non-hydrogen atoms are replaced by carbons, and all bonds by single 
bonds; you calculate and store the fingerprints of these fragments. Finally you 
apply the same transformation to your input structure, calculate the 
fingerprint, and do your substructure search.

Best,

Grégori


On Monday, February 10, 2020 16:08 CET, Alexis Parenty 
<alexis.parenty.h...@gmail.com> wrote:
 
Dear Rdkiters,
I am interested in doing substructure searches between many thousands 
structures and many thousands of fragments, as quickly as possible, with 
reasonable accuracy (> 0.95)...
I did read Greg's excellent post on that subject:
http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
I was using the rdkit pattern fingerprint approach to filter out any fragments 
that have no chance of matching the bigger structure through the slow and more 
accurate molecular graph approach, saving a lot of time.
However, I realized that this rdkit pattern fingerprint approach only works 
well if we compared smiles with smiles:
 
def frag_is_a_substructure_of_structure_via_pfp(frag, smiles):
    pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
    pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))

    frag_bits = set(pfp_frag.GetOnBits())
    structure_bits = set(pfp_structure.GetOnBits())

    if frag_bits.issubset(structure_bits):
        return True
    else:
        return False
 
Unfortunately, some of my fragments are Smarts that are not valid Smiles: Using 
Chem.MolFromSmarts(smarts) gives really poor result (Many False Positives 
leading to poor Specificity). Interestingly, there is no False Negative, 
leading to a Sensitivity of 1!
 
def frag_is_a_substructure_of_structure_via_pfp(frag, smiles):
    pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
    pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))

    frag_bits = set(pfp_frag.GetOnBits())
    structure_bits = set(pfp_structure.GetOnBits())

    if frag_bits.issubset(structure_bits):
        return True
    else:
        return False
 
Is there a way to use pattern fingerprint (or other method) for substructure 
searches independently of the Smiles/Smarts format of the fragments? If not, is 
mol_struct.HasSubstructMatch(mol_frag) the only way I am left with?
Many thanks,

Alexis
 
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to