Re: [Rdkit-discuss] number of significant digits in molblock?
6 digits seems perfectly fine for me. On Fri, 5 Oct 2018 at 14:26, Greg Landrum wrote: > > On Fri, Oct 5, 2018 at 2:42 PM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> In the newer "V3000", the atom line is not column-based, which I believe >> gives more freedom to implementers to decide the precision of the >> coordinates. You can force RDKit to write in this format by calling >> SetForceV3000(True) on your writer object. I tried it and I get 5 digits >> after the decimal point instead of 4, so at least that's a start. Looking >> at the RDKit code (function GetV3000MolFileAtomLine), it just writes the >> coordinates without setting the precision, so what you get is the default >> stringstream conversion. Here's where one could in principle adjust this >> precision, but there's clearly no API to do so at the moment. >> > > Yep. This is not currently possible without editing C++ code. > If there is a real use case for having more than 6 sig figs for atomic > positions (this is what is currently available), we can certainly come up > with a way to make it happen. I don't recall having seen any real-world > examples where that would be desirable. > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] number of significant digits in molblock?
On Fri, Oct 5, 2018 at 2:42 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > In the newer "V3000", the atom line is not column-based, which I believe > gives more freedom to implementers to decide the precision of the > coordinates. You can force RDKit to write in this format by calling > SetForceV3000(True) on your writer object. I tried it and I get 5 digits > after the decimal point instead of 4, so at least that's a start. Looking > at the RDKit code (function GetV3000MolFileAtomLine), it just writes the > coordinates without setting the precision, so what you get is the default > stringstream conversion. Here's where one could in principle adjust this > precision, but there's clearly no API to do so at the moment. > Yep. This is not currently possible without editing C++ code. If there is a real use case for having more than 6 sig figs for atomic positions (this is what is currently available), we can certainly come up with a way to make it happen. I don't recall having seen any real-world examples where that would be desirable. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] number of significant digits in molblock?
Hi Michal, The old SDF format (aka V2000 CTAB) is column-based, as things often were in the era of Fortran 77 and punch cards. Not only the precision but also the exact position of each value on the line is specified! Here's what the spec says: The Atom Block is made up of atom lines, one line per atom with the following format: x.y.z. aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee which explains why you see four digits after the decimal point. Also note that in a huge blow to readability, no spaces are required between the coordinates; if you have coordinates with five digits before the decimal point, the numbers run into each other, and if you have even more digits, the number doesn't even fit! There are also limits in the number of atoms for similar reasons. But I digress... In the newer "V3000", the atom line is not column-based, which I believe gives more freedom to implementers to decide the precision of the coordinates. You can force RDKit to write in this format by calling SetForceV3000(True) on your writer object. I tried it and I get 5 digits after the decimal point instead of 4, so at least that's a start. Looking at the RDKit code (function GetV3000MolFileAtomLine), it just writes the coordinates without setting the precision, so what you get is the default stringstream conversion. Here's where one could in principle adjust this precision, but there's clearly no API to do so at the moment. Hope this helps, Ivan On Fri, Oct 5, 2018 at 5:44 AM Michal Krompiec wrote: > Hello, > Is it possible to control the number of significant digits of XYZ > coordinates? I am modifying coordinates of my molecules > using SetAtomPosition but when I save them into an SDF it seems that the > precision is limited to 4 digits after the decimal point (I'd like 10 > instead...). > Best wishes, > Michal > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] number of significant digits in molblock?
Hi Jan, Thanks, 6 digits is OK! Forcing V3000 did the trick: sdf_out=Chem.SDWriter(outfile) sdf_out.SetForceV3000(True) Best, Michal On Fri, 5 Oct 2018 at 12:59, Jan Holst Jensen wrote: > Hi Michal, > > V2000 format is restricted by its specification to fixed format with 4 > decimals. V3000 output is not restricted to a fixed format, but the current > code still rounds it in practice as seen below. > > To get extra precision you could change the formatting of x, y, and z > coordinate output in Code/GraphMol/FileParsers/MolFileWriter.cpp, function > GetV3000MolFileAtomLine(), > look for the > > ss << " " << x << " " << y << " " << z; > > line. Adding extra digits to the X, Y, and Z coordinates *should* not > cause issues for compliant V3000 readers. > > Cheers > -- Jan > > >>> import rdkit > >>> from rdkit import Chem > >>> from Chem import AllChem > >>> m = Chem.MolFromSmiles('CC') > >>> AllChem.Compute2DCoords(m) > 0 > >>> m.GetConformer(0).SetAtomPosition(0, > rdkit.Geometry.Point3D(0.123456789, 0.2, 0.3)) > >>> > print(Chem.MolToMolBlock(m)) > RDKit 2D > > 2 1 0 0 0 0 0 0 0 0999 V2000 > 0.12350.20000.3000 C 0 0 0 0 0 0 0 0 0 0 0 0 > <== 4 decimal digits > 0.7500 -0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > M END > > >>> print(Chem.MolToMolBlock(m, forceV3000=True)) > > RDKit 2D > > 0 0 0 0 0 0 0 0 0 0999 V3000 > M V30 BEGIN CTAB > M V30 COUNTS 2 1 0 0 0 > M V30 BEGIN ATOM > M V30 1 C 0.123457 0.2 0.3 0<== 6 decimal digits > M V30 2 C 0.75 -5.55112e-17 0 0 > M V30 END ATOM > M V30 BEGIN BOND > M V30 1 1 1 2 > M V30 END BOND > M V30 END CTAB > M END > > >>> > > On 2018-10-05 11:42, Michal Krompiec wrote: > > Hello, > Is it possible to control the number of significant digits of XYZ > coordinates? I am modifying coordinates of my molecules > using SetAtomPosition but when I save them into an SDF it seems that the > precision is limited to 4 digits after the decimal point (I'd like 10 > instead...). > Best wishes, > Michal > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] number of significant digits in molblock?
Hi Michal, V2000 format is restricted by its specification to fixed format with 4 decimals. V3000 output is not restricted to a fixed format, but the current code still rounds it in practice as seen below. To get extra precision you could change the formatting of x, y, and z coordinate output in Code/GraphMol/FileParsers/MolFileWriter.cpp, function GetV3000MolFileAtomLine(), look for the ss << " " << x << " " << y << " " << z; line. Adding extra digits to the X, Y, and Z coordinates *should* not cause issues for compliant V3000 readers. Cheers -- Jan >>> import rdkit >>> from rdkit import Chem >>> from Chem import AllChem >>> m = Chem.MolFromSmiles('CC') >>> AllChem.Compute2DCoords(m) 0 >>> m.GetConformer(0).SetAtomPosition(0, rdkit.Geometry.Point3D(0.123456789, 0.2, 0.3)) >>> print(Chem.MolToMolBlock(m)) RDKit 2D 2 1 0 0 0 0 0 0 0 0999 V2000 0.1235 0.2000 0.3000 C 0 0 0 0 0 0 0 0 0 0 0 0 <== 4 decimal digits 0.7500 -0. 0. C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 M END >>> print(Chem.MolToMolBlock(m, forceV3000=True)) RDKit 2D 0 0 0 0 0 0 0 0 0 0999 V3000 M V30 BEGIN CTAB M V30 COUNTS 2 1 0 0 0 M V30 BEGIN ATOM M V30 1 C 0.123457 0.2 0.3 0 <== 6 decimal digits M V30 2 C 0.75 -5.55112e-17 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 END BOND M V30 END CTAB M END >>> On 2018-10-05 11:42, Michal Krompiec wrote: Hello, Is it possible to control the number of significant digits of XYZ coordinates? I am modifying coordinates of my molecules using SetAtomPosition but when I save them into an SDF it seems that the precision is limited to 4 digits after the decimal point (I'd like 10 instead...). Best wishes, Michal smime.p7s Description: S/MIME Cryptographic Signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss