Re: [Rdkit-discuss] drawing code take 3
Yes, of course, storing the images is an alternative. -P. On Thu, Dec 15, 2016 at 5:46 PM, Dimitri Maziukwrote: > On 12/15/2016 04:23 PM, Peter S. Shenkin wrote: > > > Obviously, it doesn't matter if you're rendering just few structures, but > > in a scenario where you might be downloading a hundred SMILES from a DB > and > > displaying them on a grid in a browser, computing the 2D depictions on > the > > fly, waiting 5 sec for a page refresh wouldn't be great. > > Maybe not, but depending how the browser lays out the grid, it may take > 5 seconds anyway. > > My recommendation for that use case would be to pre-generate the images > and store the URLs in that database. Which is what we do here. > > ;) > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/15/2016 04:23 PM, Peter S. Shenkin wrote: > Obviously, it doesn't matter if you're rendering just few structures, but > in a scenario where you might be downloading a hundred SMILES from a DB and > displaying them on a grid in a browser, computing the 2D depictions on the > fly, waiting 5 sec for a page refresh wouldn't be great. Maybe not, but depending how the browser lays out the grid, it may take 5 seconds anyway. My recommendation for that use case would be to pre-generate the images and store the URLs in that database. Which is what we do here. ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
Well, Figure 10 shows that a molecule with about 25 heavy atoms takes about 50 ms to optimize. In John Mayfield's UGM talk, it looks like CDK is taking an average of 1 ms for "easy" structures and 56 ms for the hard ones, some of which are depicted and have far more than 25 heavy atoms. We don't know the details of the two data sets, so a head-to-head comparison is tough, but intuitively, 20 structures/sec sounds slow. Having said that, it's reasonable to pay a price in speed for additional quality and robustness. Obviously, it doesn't matter if you're rendering just few structures, but in a scenario where you might be downloading a hundred SMILES from a DB and displaying them on a grid in a browser, computing the 2D depictions on the fly, waiting 5 sec for a page refresh wouldn't be great. -P. On Thu, Dec 15, 2016 at 4:22 PM, Dimitri Maziukwrote: > On 12/15/2016 02:53 PM, Peter S. Shenkin wrote: > > Looks good, but maybe too slow for production use... (?) > > I wonder what kind of production use would require sub-second wall clock > time for this. > > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/15/2016 02:53 PM, Peter S. Shenkin wrote: > Looks good, but maybe too slow for production use... (?) I wonder what kind of production use would require sub-second wall clock time for this. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
Looks good, but maybe too slow for production use... (?) -P. On Thu, Dec 15, 2016 at 3:38 PM, Chris Swainwrote: > At first glance this looks an interesting approach. > > Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram > Generation of Complex Molecules and Ligand–Protein Interactions > DOI: http://dx.doi.org/10.1021/acs.jcim.6b00391 > > On 27 Sep 2016, at 05:38, rdkit-discuss-requ...@lists.sourceforge.net > wrote: > > 2D drawing code is tough. The 90/10 rule applies: the last 10% of > correctness takes 90% of the effort. > > I like Dmitri Agrafiotis's method, but IIRC it's patented; also, though > it's good for rough work, it doesn't produce "beautiful" structural > diagrams. > > Some of the 2D drawing methods that do produce "pretty" pictures have a > large number of templates built in that match the most common (and even > somewhat uncommon) motifs, and they fall down when they hit something they > can't get a close enough match for. And then, the IUPAC has a whole list of > "desirable" features in 2D diagrams (as in, "Don't show it this way, but > rather show it that way."). So even if you produce what might appear to be > an acceptable drawing, it might not match the IUPAC list of desirables. > > I think for the present purposes what we need is something correct, robust > and legible, and of course the example shown does not exhibit that. (But I > don't know what the starting SMILES is, so I don't know whether the > 7-bonded C is due to a bad SMILES, in which case all bets are off.) > > In addition, I think some discussion earlier indicated that the RDKit 2D > structures look much worse when H's are included. > > I actually wrote a code one time (while at Schr?dinger) to give a "badness" > score to 2D structures. When our 2D depiction development was in progress, > we created 2D SD files for many thousands of structures. I could put these > through the program and sort with the worst on top. That allowed the most > severe problems to be identified more quickly than, say, looking at > thousands of 2D diagrams. The program looked at three things: Number of > bonds that crossed, Number of atoms that were too close together, and Large > disparity of bond lengths within the same molecule. (The checking code > didn't deal with labels.) > > Writing the checker was a fun project, but I'm glad I didn't have to write > the 2D depiction code. As Mark Twain said, "Improving oneself is good. > Improving others is better ? and easier." > > -P. > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
At first glance this looks an interesting approach. Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram Generation of Complex Molecules and Ligand–Protein Interactions DOI: http://dx.doi.org/10.1021/acs.jcim.6b00391 > On 27 Sep 2016, at 05:38, rdkit-discuss-requ...@lists.sourceforge.net wrote: > > 2D drawing code is tough. The 90/10 rule applies: the last 10% of > correctness takes 90% of the effort. > > I like Dmitri Agrafiotis's method, but IIRC it's patented; also, though > it's good for rough work, it doesn't produce "beautiful" structural > diagrams. > > Some of the 2D drawing methods that do produce "pretty" pictures have a > large number of templates built in that match the most common (and even > somewhat uncommon) motifs, and they fall down when they hit something they > can't get a close enough match for. And then, the IUPAC has a whole list of > "desirable" features in 2D diagrams (as in, "Don't show it this way, but > rather show it that way."). So even if you produce what might appear to be > an acceptable drawing, it might not match the IUPAC list of desirables. > > I think for the present purposes what we need is something correct, robust > and legible, and of course the example shown does not exhibit that. (But I > don't know what the starting SMILES is, so I don't know whether the > 7-bonded C is due to a bad SMILES, in which case all bets are off.) > > In addition, I think some discussion earlier indicated that the RDKit 2D > structures look much worse when H's are included. > > I actually wrote a code one time (while at Schr?dinger) to give a "badness" > score to 2D structures. When our 2D depiction development was in progress, > we created 2D SD files for many thousands of structures. I could put these > through the program and sort with the worst on top. That allowed the most > severe problems to be identified more quickly than, say, looking at > thousands of 2D diagrams. The program looked at three things: Number of > bonds that crossed, Number of atoms that were too close together, and Large > disparity of bond lengths within the same molecule. (The checking code > didn't deal with labels.) > > Writing the checker was a fun project, but I'm glad I didn't have to write > the 2D depiction code. As Mark Twain said, "Improving oneself is good. > Improving others is better ? and easier." > > -P. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss