Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Michal Krompiec
6 digits seems perfectly fine for me.

On Fri, 5 Oct 2018 at 14:26, Greg Landrum  wrote:

>
> On Fri, Oct 5, 2018 at 2:42 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> In the newer "V3000", the atom line is not column-based, which I believe
>> gives more freedom to implementers to decide the precision of the
>> coordinates. You can force RDKit to write in this format by calling
>> SetForceV3000(True) on your writer object. I tried it and I get 5 digits
>> after the decimal point instead of 4, so at least that's a start. Looking
>> at the RDKit code (function GetV3000MolFileAtomLine), it just writes the
>> coordinates without setting the precision, so what you get is the default
>> stringstream conversion. Here's where one could in principle adjust this
>> precision, but there's clearly no API to do so at the moment.
>>
>
> Yep. This is not currently possible without editing C++ code.
> If there is a real use case for having more than 6 sig figs for atomic
> positions (this is what is currently available), we can certainly come up
> with a way to make it happen. I don't recall having seen any real-world
> examples where that would be desirable.
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Greg Landrum
On Fri, Oct 5, 2018 at 2:42 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> In the newer "V3000", the atom line is not column-based, which I believe
> gives more freedom to implementers to decide the precision of the
> coordinates. You can force RDKit to write in this format by calling
> SetForceV3000(True) on your writer object. I tried it and I get 5 digits
> after the decimal point instead of 4, so at least that's a start. Looking
> at the RDKit code (function GetV3000MolFileAtomLine), it just writes the
> coordinates without setting the precision, so what you get is the default
> stringstream conversion. Here's where one could in principle adjust this
> precision, but there's clearly no API to do so at the moment.
>

Yep. This is not currently possible without editing C++ code.
If there is a real use case for having more than 6 sig figs for atomic
positions (this is what is currently available), we can certainly come up
with a way to make it happen. I don't recall having seen any real-world
examples where that would be desirable.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Ivan Tubert-Brohman
Hi Michal,

The old SDF format (aka V2000 CTAB) is column-based, as things often were
in the era of Fortran 77 and punch cards. Not only the precision but also
the exact position of each value on the line is specified! Here's what the
spec says:

The Atom Block is made up of atom lines, one line per atom with the
following format:

x.y.z. aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee

which explains why you see four digits after the decimal point. Also note
that in a huge blow to readability, no spaces are required between the
coordinates; if you have coordinates with five digits before the decimal
point, the numbers run into each other, and if you have even more digits,
the number doesn't even fit! There are also limits in the number of atoms
for similar reasons. But I digress...

In the newer "V3000", the atom line is not column-based, which I believe
gives more freedom to implementers to decide the precision of the
coordinates. You can force RDKit to write in this format by calling
SetForceV3000(True) on your writer object. I tried it and I get 5 digits
after the decimal point instead of 4, so at least that's a start. Looking
at the RDKit code (function GetV3000MolFileAtomLine), it just writes the
coordinates without setting the precision, so what you get is the default
stringstream conversion. Here's where one could in principle adjust this
precision, but there's clearly no API to do so at the moment.

Hope this helps,
Ivan


On Fri, Oct 5, 2018 at 5:44 AM Michal Krompiec 
wrote:

> Hello,
> Is it possible to control the number of significant digits of XYZ
> coordinates? I am modifying coordinates of my molecules
> using SetAtomPosition but when I save them into an SDF it seems that the
> precision is limited to 4 digits after the decimal point (I'd like 10
> instead...).
> Best wishes,
> Michal
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Michal Krompiec
Hi Jan,
Thanks, 6 digits is OK! Forcing V3000 did the trick:
sdf_out=Chem.SDWriter(outfile)
sdf_out.SetForceV3000(True)

Best,
Michal

On Fri, 5 Oct 2018 at 12:59, Jan Holst Jensen  wrote:

> Hi Michal,
>
> V2000 format is restricted by its specification to fixed format with 4
> decimals. V3000 output is not restricted to a fixed format, but the current
> code still rounds it in practice as seen below.
>
> To get extra precision you could change the formatting of x, y, and z
> coordinate output in Code/GraphMol/FileParsers/MolFileWriter.cpp, function 
> GetV3000MolFileAtomLine(),
> look for the
>
> ss << " " << x << " " << y << " " << z;
>
> line. Adding extra digits to the X, Y, and Z coordinates *should* not
> cause issues for compliant V3000 readers.
>
> Cheers
> -- Jan
>
> >>> import rdkit
> >>> from rdkit import Chem
> >>> from Chem import AllChem
> >>> m = Chem.MolFromSmiles('CC')
> >>> AllChem.Compute2DCoords(m)
> 0
> >>> m.GetConformer(0).SetAtomPosition(0,
> rdkit.Geometry.Point3D(0.123456789, 0.2, 0.3))
> >>>
> print(Chem.MolToMolBlock(m))
>  RDKit  2D
>
>   2  1  0  0  0  0  0  0  0  0999 V2000
> 0.12350.20000.3000 C   0  0  0  0  0  0  0  0  0  0  0  0
> <== 4 decimal digits
> 0.7500   -0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
> M  END
>
> >>> print(Chem.MolToMolBlock(m, forceV3000=True))
>
>  RDKit  2D
>
>   0  0  0  0  0  0  0  0  0  0999 V3000
> M  V30 BEGIN CTAB
> M  V30 COUNTS 2 1 0 0 0
> M  V30 BEGIN ATOM
> M  V30 1 C 0.123457 0.2 0.3 0<== 6 decimal digits
> M  V30 2 C 0.75 -5.55112e-17 0 0
> M  V30 END ATOM
> M  V30 BEGIN BOND
> M  V30 1 1 1 2
> M  V30 END BOND
> M  V30 END CTAB
> M  END
>
> >>>
>
> On 2018-10-05 11:42, Michal Krompiec wrote:
>
> Hello,
> Is it possible to control the number of significant digits of XYZ
> coordinates? I am modifying coordinates of my molecules
> using SetAtomPosition but when I save them into an SDF it seems that the
> precision is limited to 4 digits after the decimal point (I'd like 10
> instead...).
> Best wishes,
> Michal
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Jan Holst Jensen

Hi Michal,

V2000 format is restricted by its specification to fixed format with 4 
decimals. V3000 output is not restricted to a fixed format, but the 
current code still rounds it in practice as seen below.


To get extra precision you could change the formatting of x, y, and z 
coordinate output in Code/GraphMol/FileParsers/MolFileWriter.cpp, 
function GetV3000MolFileAtomLine(), look for the


    ss << " " << x << " " << y << " " << z;

line. Adding extra digits to the X, Y, and Z coordinates *should* not 
cause issues for compliant V3000 readers.


Cheers
-- Jan

>>> import rdkit
>>> from rdkit import Chem
>>> from Chem import AllChem
>>> m = Chem.MolFromSmiles('CC')
>>> AllChem.Compute2DCoords(m)
0
>>> m.GetConformer(0).SetAtomPosition(0, 
rdkit.Geometry.Point3D(0.123456789, 0.2, 0.3))

>>> print(Chem.MolToMolBlock(m))
 RDKit  2D

  2  1  0  0  0  0  0  0  0  0999 V2000
    0.1235    0.2000    0.3000 C   0  0  0  0  0  0  0 0  0  0  0  0    
<== 4 decimal digits

    0.7500   -0.    0. C   0  0  0  0  0  0  0 0  0  0  0  0
  1  2  1  0
M  END

>>> print(Chem.MolToMolBlock(m, forceV3000=True))

 RDKit  2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 2 1 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 0.123457 0.2 0.3 0    <== 6 decimal digits
M  V30 2 C 0.75 -5.55112e-17 0 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 END BOND
M  V30 END CTAB
M  END

>>>

On 2018-10-05 11:42, Michal Krompiec wrote:

Hello,
Is it possible to control the number of significant digits of XYZ 
coordinates? I am modifying coordinates of my molecules 
using SetAtomPosition but when I save them into an SDF it seems that 
the precision is limited to 4 digits after the decimal point (I'd like 
10 instead...).

Best wishes,
Michal




smime.p7s
Description: S/MIME Cryptographic Signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss