Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
Thank Nils for pointing both algorithms to the list. Interestingly Greg is putting together scaffold tree algorithm in this PR https://github.com/rdkit/rdkit/pull/2911 so anyone could try it in the nearest future, hopefully 2020 release. Pozdrawiam, | Best regards, Maciek Wójcikowski

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
The function takes two Explicit or Sparse bit vectors. Could you elaborate on what you mean that it accept smarts only? PatternFingerprints will work with SMARTS too. It is always more effective to have the SMARTS as explicit as possible, since if you have all alternative atoms, the FP cannot

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Nils Weskamp
Hi Alexis, if you go down that route and calculate artifical skeletons, you could also go all the way and use an algorithm like HierS [1] or the scaffold tree [2] to perform a recursive fragmentation of your queries and molecules into their various rings and ring systems. If a query contains a

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Alexis Parenty
Hi Maciek, thanks for your response. I did try that function too, but it also takes smiles only (not smarts). I think the solution of Gregori is very interesting: I am going to transform all smiles and smarts into their single-bonded-carbon-based skeleton and will store the pattern fingerprint of

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
Alexis, I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the function you are looking for here. More advanced usage and code snippets you can find on RDKit blog post that Greg has put together here: https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html Best,

Re: [Rdkit-discuss] ?==?utf-8?q? Doing substructure search as quickly as possible...

2020-02-10 Thread gregori
Hi Alexis, Knowing what you want to achieve, I would take the problem the other way around. Instead of matching your many fragments to your input structure, I would rather apply the same transformation(s) you apply to your fragments to your input structure. I know that you replace all

[Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Alexis Parenty
Dear Rdkiters, I am interested in doing substructure searches between many thousands structures and many thousands of fragments, as quickly as possible, with reasonable accuracy (> 0.95)... I did read Greg's excellent post on that subject: