Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images
Hi Samo, I don't think you need temporary files at all; have you tried something like this? from cStringIO import StringIO image_file = StringIO() img = Draw.MolToImage(row[molCol], size=size) img.save(image_file, format='PNG') image_data = image_file.getvalue() Grégori On 03. 11. 14 21:44, Samo Turk wrote: Hi Grégori, Thanks for pointing this out. I modified the code and now it writes only one temporary file. Cheers, Samo On Fri, Oct 31, 2014 at 10:56 AM, Grégori Gerebtzoff greg...@gerebtzoff.com mailto:greg...@gerebtzoff.com wrote: Hi Samo, I used a few years ago the PHPExcel library to put images into an Excel file, and it was not necessary to use physical files. Having a quick look at the library I found this class (probably the one I used): PHPExcel_Worksheet_MemoryDrawing (source code: https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php) The interesting bit: public function __construct() { // Initialise values $this-_imageResource= null; $this-_renderingFunction = self::RENDERING_DEFAULT; $this-_mimeType= self::MIMETYPE_DEFAULT; $this-_uniqueName= md5(rand(0, ). time() . rand(0, )); // Initialize parent parent::__construct(); } Thus I'm pretty sure you can use the same trick in python XlsxWriter (have a look at the _add_image_files function in packager.py), using a random file name and a bit stream to the image, as described here: http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio: filename = 'python.png' image_file = open(filename, 'rb') image_data = BytesIO(image_file.read()) image_file.close() # Write the byte stream image to a cell. The filename must be specified. worksheet.insert_image('B8', filename, {'image_data': image_data}) At least it's worth a try! Another trick I had to do both with PHPExcel and in VBA was to set the width of columns three times to make sure that it was actually correct. Don't ask me why... Just in case you face some width issues. Good luck! Grégori On 30. 10. 14 16:49, Samo Turk wrote: Hi rdkiters, Due to popular demand I started to work on a function to export pandas DataFrame to xlsx with molecule images embedded. Because of the xlsx specifics the code is not optimal. The most annoying thing about this implementation is that it has to write all images to the hard drive, before it packs them in xlsx (and deletes them at the end). I checked two python xlsx libraries and both save images that way. If someone finds better solution, please share it. The dimensions of cells with images are not optimal because Excel is weird. :) From xlsxwriter docs): The width corresponds to the column width value that is specified in Excel. It is approximately equal to the length of a string in the default font of Calibri 11. Unfortunately, there is no way to specify “AutoFit” for a column in the Excel file format. It crashes if value of a cell is of wrong type so use df['value'].astype() to fix incorrectly assigned types. Resulting files work nicely in Office 365 (standalone and web app), but for some reason don't work optimally with LibreOffice (after row ~125 it stacks all images). I made a pull request on GitHub: https://github.com/rdkit/rdkit/pull/371 Demo: http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb Demo xlsx file: https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx Regards, Samo -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net mailto:Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net mailto:Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images
Hi Grégori, I tried exactly what you suggested already yesterday but it didn't work. I did find a solution after some fiddling and now it works. https://github.com/rdkit/rdkit/pull/371/files Thanks! On Tue, Nov 4, 2014 at 9:32 AM, Grégori Gerebtzoff greg...@gerebtzoff.com wrote: Hi Samo, I don't think you need temporary files at all; have you tried something like this? from cStringIO import StringIO image_file = StringIO() img = Draw.MolToImage(row[molCol], size=size) img.save(image_file, format='PNG') image_data = image_file.getvalue() Grégori On 03. 11. 14 21:44, Samo Turk wrote: Hi Grégori, Thanks for pointing this out. I modified the code and now it writes only one temporary file. Cheers, Samo On Fri, Oct 31, 2014 at 10:56 AM, Grégori Gerebtzoff greg...@gerebtzoff.com wrote: Hi Samo, I used a few years ago the PHPExcel library to put images into an Excel file, and it was not necessary to use physical files. Having a quick look at the library I found this class (probably the one I used): PHPExcel_Worksheet_MemoryDrawing (source code: https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php ) The interesting bit: public function __construct() { // Initialise values $this-_imageResource= null; $this-_renderingFunction = self::RENDERING_DEFAULT; $this-_mimeType= self::MIMETYPE_DEFAULT; $this-_uniqueName= md5(rand(0, ). time() . rand(0, )); // Initialize parent parent::__construct(); } Thus I'm pretty sure you can use the same trick in python XlsxWriter (have a look at the _add_image_files function in packager.py), using a random file name and a bit stream to the image, as described here: http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio : filename = 'python.png' image_file = open(filename, 'rb') image_data = BytesIO(image_file.read()) image_file.close() # Write the byte stream image to a cell. The filename must be specified. worksheet.insert_image('B8', filename, {'image_data': image_data}) At least it's worth a try! Another trick I had to do both with PHPExcel and in VBA was to set the width of columns three times to make sure that it was actually correct. Don't ask me why... Just in case you face some width issues. Good luck! Grégori On 30. 10. 14 16:49, Samo Turk wrote: Hi rdkiters, Due to popular demand I started to work on a function to export pandas DataFrame to xlsx with molecule images embedded. Because of the xlsx specifics the code is not optimal. The most annoying thing about this implementation is that it has to write all images to the hard drive, before it packs them in xlsx (and deletes them at the end). I checked two python xlsx libraries and both save images that way. If someone finds better solution, please share it. The dimensions of cells with images are not optimal because Excel is weird. :) From xlsxwriter docs): The width corresponds to the column width value that is specified in Excel. It is approximately equal to the length of a string in the default font of Calibri 11. Unfortunately, there is no way to specify “AutoFit” for a column in the Excel file format. It crashes if value of a cell is of wrong type so use df['value'].astype() to fix incorrectly assigned types. Resulting files work nicely in Office 365 (standalone and web app), but for some reason don't work optimally with LibreOffice (after row ~125 it stacks all images). I made a pull request on GitHub: https://github.com/rdkit/rdkit/pull/371 Demo: http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb Demo xlsx file: https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx Regards, Samo -- ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Torsion fingerprint deviation
Dear all, I’m happy to announce that the torsion fingerprint deviation (TFD) developed in Rarey’s group (J. Chem. Inf. Model., 52, 1499, 2012) are now available in the Python part of the RDKit. Thanks a lot to Gregori and Ilenia (+ Christin Schäfer) for their help at the UGM Hackathon to resolve the last disagreements with the paper! Here are some small examples on the usage, a more detailed documentation can be found in the Cookbook. There are three wrapper functions for convenience: TFD between two sets of conformers of a molecule: from rdkit.Chem import TorsionFingerprints tfd = TorsionFingerprints.GetTFDBetweenConformers(mol, confIds1=[0, 2], confIds2=[1, 3]) The result is a list of - in this case 4 - TFD values. TFD between two instances of the same molecule with different conformers. If no confIds are specified, the first one of each molecule is taken. tfd = TorsionFingerprints.GetTFDBetweenMolecules(mol1, mol2) The result is a list containing in this case a single TFD value. For clustering or diversity picking purposes, there is also a convenience function to get the matrix of TFD values. tfdmat = TorsionFingerprints.GetTFDMatrix(mol) The different steps of the TFD calculation can also be accessed independently: 1) A list of the torsions (one for non-ring bonds, one for ring bonds) is generated. For each torsion, the indices of the four atoms are stored. This has to be done only once for a molecule. 2) The weights for the torsions are calculated. By default, the bonds in the centre of the molecule receives the highest weight and the other weights are decreased based on the distance from the central bond. If another part of the molecule should have the highest weight, the user can also specify two atom indices that represent the most important bond. Again, this step has to be done only once for a molecule. 3) The torsion angles are calculated for each conformer of interest given the torsion lists. 4) The TFD value between two conformers is calculated given the torsion angles and weights. tors_list, ring_tors_list = TorsionFingerprints.CalculateTorsionLists(mol) weights = TorsionFingerprints.CalculateTorsionWeights(mol) torsions1 = TorsionFingerprints.CalculateTorsionAngles(mol, tors_list, ring_tors_list, confId=0) torsions2 = TorsionFingerprints.CalculateTorsionAngles(mol, tors_list, ring_tors_list, confId=1) tfd = TorsionFingerprints.CalculateTFD(torsions1, torsions2, weights=weights) Let me know if you encounter any problems. Best, Sereina -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss