On Apr 16, 2018, at 05:37, Patrick Walters <wpwalt...@gmail.com> wrote: > > Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I > wrote something to work directly on a molecule.
That's the approach I started with, until I figured out that it doesn't preserve chirality. If I change the end of your code to: ========== from mmpdblib import smiles_syntax def weld_dalke(core, r_groups): s1 = smiles_syntax.convert_labeled_wildcards_to_closures(core) s2 = smiles_syntax.convert_labeled_wildcards_to_closures(r_groups) return Chem.CanonSmiles(s1+"."+s2) if __name__ == "__main__": mol_to_weld = Chem.MolFromSmiles( "[*:1][C@](F)(Cl)O.N[*:1]") welded_mol = weld_r_groups(mol_to_weld) print("Expected :", Chem.CanonSmiles("N[C@](F)(Cl)O")) print("Direct :", Chem.MolToSmiles(welded_mol, isomericSmiles=True)) print("Via SMILES:", weld_dalke("[*:1][C@](F)(Cl)O", "N[*:1]")) ========== These should print identical SMILES strings, but instead give: Expected : N[C@](O)(F)Cl Direct : N[C@@](O)(F)Cl Via SMILES: N[C@](O)(F)Cl If chirality preservation isn't a concern, then there's no problem. BTW, your current code assumes there will only be one attachment point on an atom. For example, the input [*:1][C@]([*:2])(Cl)O.N[*:1].F[*:2] create the output N.O[C](F)Cl It's not hard to fix, and I think more of a d'oh! issue. In a quick benchmark I put together just now, I found that my SMILES syntax manipulation approach was about twice as fast to turn the two core/R-group SMILES strings into a molecule. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss