>
> I wonder why I haven't asked this before. How is this done on OrChem?
>
> "These columns provide a quick way to materialize a basic CDK molecule to
> be passed into the VF2 algorithm.  The data structures used are quite
> straightforward, for instance with data in column atom "C O" interpreted
> as: "atom 0 is Carbon, atom 1 is Oxygen" and bond
>  column "0 1 D Y" then implying "there is a bond between C (atom 0) and O
>  (atom 1) that is double (D) and aromatic is true (Y)". In this way, CDK
>  molecules can be generated very fast without the need for calculating
> any properties during the search."


Yes, 'properties' here means mainly aromaticity, which is relatively
expensive to calculate. So the database molecules get a fingerprint but
they also get stored in a simple format for quick re-assembly before going
into VF2.
>
> Is this VF2 the turbo-substructure algorithm? Or a custom one?

It's custom, I still have to see if I can replace it with SMSD and if that
would be faster. At the time the CDK did not offer a VF2 algorithm.


> Do you create real cdk molecules or just some kind of graph representation
> you but in your custom VF2?

The latter, just what I need for VF2.

>
> Which properties do you need for VF2? Implicit hydrogens? Or is it enough
> to assign an atom it's symbol "C" and each bond an order?

Element symbol, explicit hydrogens, bond orders, aromaticity. I don't
include charge, I probably should (so now the search is charge
insensitive)

It's also really beneficial to 'sort' the atom container, it's described
somewhere in the paper. Try and let VF2 match the least common elements
first (generally not the carbons or oxygens). I benchmarked that and
performance shot up.

cheers,
Mark




------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to