Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Guillaume GODIN
Here, Dragon results for the 3 molecules: I've included both  Whim and 3D 
descriptors but I don't have access to PMi!


I found the second document in agreement with Peter answer...


BR,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Peter Gedeck 
Envoyé : dimanche 15 janvier 2017 15:07
À : Greg Landrum; RDKit Discuss; Guillaume GODIN
Objet : Re: [Rdkit-discuss] PMI API

According to this:
https://en.wikipedia.org/wiki/List_of_moments_of_inertia
The moments of inertia of a disk (something like benzene) are:

Iz = mr^2/2
Ix = Iy = mr^2/4

None of them is zero. The smallest moment of inertia of a rod-like molecule 
(e.g. C#C) is zero.

Best,

Peter



On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum 
> wrote:
Hi Guillaume,

I think it this case it's something else. According to the Todeschini article 
the smallest moment of inertia of a planar molecule like benzene should be 
zero. The eigenvalues of the inertia matrix for benzene, however, are 
definitely not zero (and not close enough that it's likely to be round-off 
error).
It would be very nice if you could run the three files I mention through Dragon 
and let me know what it calculates for those descriptors.

-greg


_
From: Guillaume GODIN 
>
Sent: Sunday, January 15, 2017 1:11 PM
Subject: RE: [Rdkit-discuss] PMI API
To: Greg Landrum >, RDKit 
Discuss 
>,
 Chris Earnshaw >




Dear Greg,


I  suspect that it's a precision error or eigen algorithm shift between rdkit 
c++ & dragon.


To obtain good value, I suggest to try to implement a test on the eigen values 
like i did in gateway.cpp implementation.



JacobiSVD getSVD(MatrixXd A) {

JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);

return mysvd;

}


// get the A-1 matrix using

MatrixXd GetPinv(MatrixXd A){

JacobiSVD svd = getSVD(A);

double  pinvtoler=1.e-2;// choose your tolerance wisely!

VectorXd vs=svd.singularValues();

VectorXd vsinv=svd.singularValues();


for (unsignedint i=0; i pinvtoler )

   vsinv(i)=1.0/vs(i);

   else vsinv(i)=0.0;

}


MatrixXd S =  vsinv.asDiagonal();

MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();

return Ap;

}


If it's not solve the problem, I would like to test it in Matlab. can you 
provide me the 3 (3d xyz matrix) of your example please ?


I also have Dragon 6


best regards,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Greg Landrum >
Envoyé : dimanche 15 janvier 2017 11:50
À : Chris Earnshaw; RDKit Discuss
Objet : Re: [Rdkit-discuss] PMI API

I managed to make some time to look into this this weekend and I've found a bug 
and something I don't understand. Hopefully the community can help out here.
On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
4) The big one! The returned results look very odd. They appear to relate more 
to the dimensions of the molecule than the moments of inertia. For a rod-like 
molecule (dimethylacetylene) I'd expect two large and one small PMI (e.g. PMI1: 
6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828  NPR2: 0.98) but 
actually get PMI1: 0.061647  PMI2: 0.061652  PMI3: 25.3699  NPR1: 0.002430  
NPR2: 0.002430.
For disk-like (benzene) the result should be one large and two medium (e.g. 
PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2: 0.500013) 
but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1: 2.14213e-11  
NPR2: 0.33.
Finally for a roughly spherical molecule (neopentane) the NPR values look 
reasonable (no great surprise) but the absolute PMI values may be too small: 
old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2: 6.59488  
PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35

Your expectations are correct: the current RDKit implementation is wrong. The 
corresponding github entry is here: https://github.com/rdkit/rdkit/issues/1262
This is due to a mistake in the way the principal moments are calculated (which 

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Chris Earnshaw
Thanks Greg

I've built a version of RDKit with fixes from https://github.com/
greglandrum/rdkit/tree/fix/github1262 and can confirm that it gives exactly
the same values of PMI and NPR that I got with the RDKit fork by 'hahnda6'.
I can't say for certain that the PMI values are correct in absolute terms,
but the NPR values are certainly what would be expected for those test
molecules.

I'm worried about the Todeschini paper - I think there are errors in some
of the equations and inconsistencies in the discussion, some of which may
involve mixing up PMIs with eigenvalues of the covariance matrix.
Unfortunately I don't have access to the original references so can't check
in detail, but I'd be disinclined to take any of the equations at face
value.

Chris

On 15 January 2017 at 10:50, Greg Landrum  wrote:

> I managed to make some time to look into this this weekend and I've found
> a bug and something I don't understand. Hopefully the community can help
> out here.
>
> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
>
>> 4) The big one! The returned results look very odd. They appear to relate
>> more to the dimensions of the molecule than the moments of inertia. For a
>> rod-like molecule (dimethylacetylene) I'd expect two large and one small
>> PMI (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
>> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
>> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
>> For disk-like (benzene) the result should be one large and two medium
>> (e.g. PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2:
>> 0.500013) but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1:
>> 2.14213e-11  NPR2: 0.33.
>> Finally for a roughly spherical molecule (neopentane) the NPR values look
>> reasonable (no great surprise) but the absolute PMI values may be too
>> small: old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
>> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2:
>> 6.59488  PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>>
>
> Your expectations are correct: the current RDKit implementation is wrong.
> The corresponding github entry is here: https://github.com/
> rdkit/rdkit/issues/1262
> This is due to a mistake in the way the principal moments are calculated
> (which is due to the fact that I don't spend a lot of time working
> with/thinking about 3D descriptors). Instead of using the
> eigenvectors/eigenvalues of the inertia matrix (the tensor of inertia) the
> RDKit is currently using the covariance matrix. There's some more on the
> relationship between these two here: http://number-none.com/
> blow/inertia/deriving_i.html
>
> The problem is easy to fix (and I have something working here:
> https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws
> up the values of the descriptors that are derived from here:
> Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of
> Chemoinformatics http://dx.doi.org/10.1002/9783527618279.ch37
> These include the radius of gyration, inertial shape factor, etc.
> Within that article they state that Ic = 0 for planar molecules. Ignoring
> the inequality on page 1010, which says that Ic is the largest moment and
> is contradicted by the rest of the text (particularly the inequalities on
> page 1011), Ic corresponds to the smallest principal moment : PMI1.
>
> So now I'm confused, but I'm hoping this is obvious to someone versed in
> the field: I'd like to reproduce the descriptors described in the
> Todeschini article, but I clearly can't do that using the actual moments of
> inertia. I could keep using the eigenvalues of the covariance matrix there,
> but that doesn't match what's described in the text.
>
> Two things that would be extremely helpful:
> 1) an explanation of the disconnect here from someone who knows this
> stuff, I would guess that it's pretty simple
> 2) The results of running the files github1262_1.mol, github1262_2.mol,
> and github1262_3.mol from here: https://github.com/
> greglandrum/rdkit/tree/fix/github1262/Code/GraphMol/
> MolTransforms/test_data through Dragon and calculating the radius of
> gyration, inertial shape factor, eccentricity, molecular asphericity, and
> spherocity index.
>
> Best,
> -greg
>
>
>
>>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Greg Landrum
On Sun, Jan 15, 2017 at 5:15 PM, Chris Earnshaw 
wrote:

>
> I've built a version of RDKit with fixes from https://github.com/
> greglandrum/rdkit/tree/fix/github1262 and can confirm that it gives
> exactly the same values of PMI and NPR that I got with the RDKit fork by
> 'hahnda6'. I can't say for certain that the PMI values are correct in
> absolute terms, but the NPR values are certainly what would be expected for
> those test molecules.
>

Glad to hear it.



> I'm worried about the Todeschini paper - I think there are errors in some
> of the equations and inconsistencies in the discussion, some of which may
> involve mixing up PMIs with eigenvalues of the covariance matrix.
> Unfortunately I don't have access to the original references so can't check
> in detail, but I'd be disinclined to take any of the equations at face
> value.
>

Ok. I'm going to have to see if I can track down some additional references
and work from there.


-greg
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Greg Landrum
Thanks Guillaume!

On Sun, Jan 15, 2017 at 5:01 PM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Here, Dragon results for the 3 molecules: I've included both  Whim and 3D
> descriptors but I don't have access to PMi!
>
>
> I found the second document in agreement with Peter answer...
>
>
> BR,
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <+41%2079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Peter Gedeck 
> *Envoyé :* dimanche 15 janvier 2017 15:07
> *À :* Greg Landrum; RDKit Discuss; Guillaume GODIN
>
> *Objet :* Re: [Rdkit-discuss] PMI API
>
> According to this:
> https://en.wikipedia.org/wiki/List_of_moments_of_inertia
> The moments of inertia of a disk (something like benzene) are:
>
> Iz = mr^2/2
> Ix = Iy = mr^2/4
>
> None of them is zero. The smallest moment of inertia of a rod-like
> molecule (e.g. C#C) is zero.
>
> Best,
>
> Peter
>
>
>
> On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum 
> wrote:
>
>> Hi Guillaume,
>>
>> I think it this case it's something else. According to the Todeschini
>> article the smallest moment of inertia of a planar molecule like benzene
>> should be zero. The eigenvalues of the inertia matrix for benzene, however,
>> are definitely not zero (and not close enough that it's likely to be
>> round-off error).
>> It would be very nice if you could run the three files I mention through
>> Dragon and let me know what it calculates for those descriptors.
>>
>> -greg
>>
>>
>> _
>> From: Guillaume GODIN 
>> Sent: Sunday, January 15, 2017 1:11 PM
>> Subject: RE: [Rdkit-discuss] PMI API
>> To: Greg Landrum , RDKit Discuss <
>> rdkit-discuss@lists.sourceforge.net>, Chris Earnshaw <
>> cgearns...@gmail.com>
>>
>>
>>
>> Dear Greg,
>>
>>
>> I  suspect that it's a precision error or eigen algorithm shift between
>> rdkit c++ & dragon.
>>
>>
>> To obtain good value, I suggest to try to implement a test on the eigen
>> values like i did in gateway.cpp implementation.
>>
>>
>>
>> JacobiSVD getSVD(MatrixXd A) {
>>
>> JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);
>>
>> return mysvd;
>>
>> }
>>
>>
>> // get the A-1 matrix using
>>
>> MatrixXd GetPinv(MatrixXd A){
>>
>> JacobiSVD svd = getSVD(A);
>>
>> double  pinvtoler=1.e-2;// choose your tolerance wisely!
>>
>> VectorXd vs=svd.singularValues();
>>
>> VectorXd vsinv=svd.singularValues();
>>
>>
>> for (unsignedint i=0; i>
>> if ( vs(i) > pinvtoler )
>>
>>vsinv(i)=1.0/vs(i);
>>
>>else vsinv(i)=0.0;
>>
>> }
>>
>>
>> MatrixXd S =  vsinv.asDiagonal();
>>
>> MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();
>>
>> return Ap;
>>
>> }
>>
>>
>> If it's not solve the problem, I would like to test it in Matlab. can you
>> provide me the 3 (3d xyz matrix) of your example please ?
>>
>>
>> I also have Dragon 6
>>
>>
>> best regards,
>>
>> *Dr. Guillaume GODIN*
>> Principal Scientist
>> Chemoinformatic & Datamining
>> Innovation
>> CORPORATE R DIVISION
>> DIRECT LINE +41 (0)22 780 3645 <022%20780%2036%2045>
>> MOBILE  +41 (0)79 536 1039 <079%20536%2010%2039>
>> Firmenich SA
>> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>>
>> --
>> *De :* Greg Landrum 
>> *Envoyé :* dimanche 15 janvier 2017 11:50
>> *À :* Chris Earnshaw; RDKit Discuss
>> *Objet :* Re: [Rdkit-discuss] PMI API
>>
>> I managed to make some time to look into this this weekend and I've found
>> a bug and something I don't understand. Hopefully the community can help
>> out here.
>> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
>> wrote:
>>
>> 4) The big one! The returned results look very odd. They appear to relate
>> more to the dimensions of the molecule than the moments of inertia. For a
>> rod-like molecule (dimethylacetylene) I'd expect two large and one small
>> PMI (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
>> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
>> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
>> For disk-like (benzene) the result should be one large and two medium
>> (e.g. PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2:
>> 0.500013) but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1:
>> 2.14213e-11  NPR2: 0.33.
>> Finally for a roughly spherical molecule (neopentane) the NPR values look
>> reasonable (no great surprise) but the absolute PMI values may be too
>> small: old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
>> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Greg Landrum
I managed to make some time to look into this this weekend and I've found a
bug and something I don't understand. Hopefully the community can help out
here.

On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
wrote:

> 4) The big one! The returned results look very odd. They appear to relate
> more to the dimensions of the molecule than the moments of inertia. For a
> rod-like molecule (dimethylacetylene) I'd expect two large and one small
> PMI (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
> For disk-like (benzene) the result should be one large and two medium
> (e.g. PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2:
> 0.500013) but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1:
> 2.14213e-11  NPR2: 0.33.
> Finally for a roughly spherical molecule (neopentane) the NPR values look
> reasonable (no great surprise) but the absolute PMI values may be too
> small: old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2:
> 6.59488  PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>

Your expectations are correct: the current RDKit implementation is wrong.
The corresponding github entry is here:
https://github.com/rdkit/rdkit/issues/1262
This is due to a mistake in the way the principal moments are calculated
(which is due to the fact that I don't spend a lot of time working
with/thinking about 3D descriptors). Instead of using the
eigenvectors/eigenvalues of the inertia matrix (the tensor of inertia) the
RDKit is currently using the covariance matrix. There's some more on the
relationship between these two here:
http://number-none.com/blow/inertia/deriving_i.html

The problem is easy to fix (and I have something working here:
https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws up
the values of the descriptors that are derived from here:
Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of
Chemoinformatics http://dx.doi.org/10.1002/9783527618279.ch37
These include the radius of gyration, inertial shape factor, etc.
Within that article they state that Ic = 0 for planar molecules. Ignoring
the inequality on page 1010, which says that Ic is the largest moment and
is contradicted by the rest of the text (particularly the inequalities on
page 1011), Ic corresponds to the smallest principal moment : PMI1.

So now I'm confused, but I'm hoping this is obvious to someone versed in
the field: I'd like to reproduce the descriptors described in the
Todeschini article, but I clearly can't do that using the actual moments of
inertia. I could keep using the eigenvalues of the covariance matrix there,
but that doesn't match what's described in the text.

Two things that would be extremely helpful:
1) an explanation of the disconnect here from someone who knows this stuff,
I would guess that it's pretty simple
2) The results of running the files github1262_1.mol, github1262_2.mol, and
github1262_3.mol from here:
https://github.com/greglandrum/rdkit/tree/fix/github1262/Code/GraphMol/MolTransforms/test_data
through Dragon and calculating the radius of gyration, inertial shape
factor, eccentricity, molecular asphericity, and spherocity index.

Best,
-greg



>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Greg Landrum
Hi Guillaume,
I think it this case it's something else. According to the Todeschini article 
the smallest moment of inertia of a planar molecule like benzene should be 
zero. The eigenvalues of the inertia matrix for benzene, however, are 
definitely not zero (and not close enough that it's likely to be round-off 
error).It would be very nice if you could run the three files I mention through 
Dragon and let me know what it calculates for those descriptors.
-greg


_
From: Guillaume GODIN 
Sent: Sunday, January 15, 2017 1:11 PM
Subject: RE: [Rdkit-discuss] PMI API
To: Greg Landrum , RDKit Discuss 
, Chris Earnshaw 




Dear Greg,





I  suspect that it's a precision error or eigen algorithm shift between rdkit 
c++ & dragon.





To obtain good value, I suggest to try to implement a test on the eigen values 
like i did in gateway.cpp implementation.








JacobiSVD getSVD(MatrixXd A) {

    JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);

    return mysvd;

}




// get the A-1 matrix using 


MatrixXd GetPinv(MatrixXd A){

    JacobiSVD svd = getSVD(A);

    double  pinvtoler=1.e-2;// choose your tolerance wisely!

    VectorXd vs=svd.singularValues();

    VectorXd vsinv=svd.singularValues();




    for (unsignedint i=0; i pinvtoler )

           vsinv(i)=1.0/vs(i);

       else vsinv(i)=0.0;

    }




    MatrixXd S =  vsinv.asDiagonal();

    MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();

    return Ap;

}




If it's not solve the problem, I would like to test it in Matlab. can you 
provide me the 3 (3d xyz matrix) of your example please ?





I also have Dragon 6





best regards,

Dr. Guillaume GODINPrincipal ScientistChemoinformatic & 
DataminingInnovationCORPORATE R DIVISIONDIRECT LINE +41 (0)22 780 3645MOBILE  
        +41 (0)79 536 1039        Firmenich SA        RUE DES JEUNES 1 | CASE 
POSTALE 239 | CH-1211 GENEVE 8
De : Greg Landrum 
Envoyé : dimanche 15 janvier 2017 11:50
À : Chris Earnshaw; RDKit Discuss
Objet : Re: [Rdkit-discuss] PMI API I managed to make some time to look into 
this this weekend and I've found a bug and something I don't understand. 
Hopefully the community can help out here.

On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw  wrote:
4) The big one! The returned results look very odd. They appear to relate more 
to the dimensions of the molecule than the moments of inertia. For a rod-like 
molecule (dimethylacetylene) I'd expect two large and one small PMI (e.g. PMI1: 
6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828  NPR2: 0.98) but 
actually get PMI1: 0.061647  PMI2: 0.061652  PMI3: 25.3699  NPR1: 0.002430  
NPR2: 0.002430.
For disk-like (benzene) the result should be one large and two medium (e.g. 
PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2: 0.500013) 
but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1: 2.14213e-11  
NPR2: 0.33.
Finally for a roughly spherical molecule (neopentane) the NPR values look 
reasonable (no great surprise) but the absolute PMI values may be too small: 
old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2: 6.59488  
PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
Your expectations are correct: the current RDKit implementation is wrong. The 
corresponding github entry is here: 
https://github.com/rdkit/rdkit/issues/1262This is due to a mistake in the way 
the principal moments are calculated (which is due to the fact that I don't 
spend a lot of time working with/thinking about 3D descriptors). Instead of 
using the eigenvectors/eigenvalues of the inertia matrix (the tensor of 
inertia) the RDKit is currently using the covariance matrix. There's some more 
on the relationship between these two here: 
http://number-none.com/blow/inertia/deriving_i.html

The problem is easy to fix (and I have something working here: 
https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws up the 
values of the descriptors that are derived from here:Todeschini and Consoni 
"Descriptors from Molecular Geometry" Handbook of 
Chemoinformaticshttp://dx.doi.org/10.1002/9783527618279.ch37These include the 
radius of gyration, inertial shape factor, etc.Within that article they state 
that Ic = 0 for planar molecules. Ignoring the inequality on page 1010, which 
says that Ic is the largest moment and is contradicted by the rest of the text 
(particularly the inequalities on page 1011), Ic corresponds to the smallest 
principal moment : PMI1.
So now I'm confused, but I'm hoping this is obvious to someone versed in the 
field: I'd like to reproduce the descriptors described in the Todeschini 
article, but I clearly can't do that using the actual moments of inertia. I 
could 

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Guillaume GODIN
Dear Greg,


I  suspect that it's a precision error or eigen algorithm shift between rdkit 
c++ & dragon.


To obtain good value, I suggest to try to implement a test on the eigen values 
like i did in gateway.cpp implementation.



JacobiSVD getSVD(MatrixXd A) {

JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);

return mysvd;

}


// get the A-1 matrix using

MatrixXd GetPinv(MatrixXd A){

JacobiSVD svd = getSVD(A);

double  pinvtoler=1.e-2; // choose your tolerance wisely!

VectorXd vs=svd.singularValues();

VectorXd vsinv=svd.singularValues();


for (unsigned int i=0; i pinvtoler )

   vsinv(i)=1.0/vs(i);

   else vsinv(i)=0.0;

}


MatrixXd S =  vsinv.asDiagonal();

MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();

return Ap;

}


If it's not solve the problem, I would like to test it in Matlab. can you 
provide me the 3 (3d xyz matrix) of your example please ?


I also have Dragon 6


best regards,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Greg Landrum 
Envoyé : dimanche 15 janvier 2017 11:50
À : Chris Earnshaw; RDKit Discuss
Objet : Re: [Rdkit-discuss] PMI API

I managed to make some time to look into this this weekend and I've found a bug 
and something I don't understand. Hopefully the community can help out here.

On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
4) The big one! The returned results look very odd. They appear to relate more 
to the dimensions of the molecule than the moments of inertia. For a rod-like 
molecule (dimethylacetylene) I'd expect two large and one small PMI (e.g. PMI1: 
6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828  NPR2: 0.98) but 
actually get PMI1: 0.061647  PMI2: 0.061652  PMI3: 25.3699  NPR1: 0.002430  
NPR2: 0.002430.
For disk-like (benzene) the result should be one large and two medium (e.g. 
PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2: 0.500013) 
but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1: 2.14213e-11  
NPR2: 0.33.
Finally for a roughly spherical molecule (neopentane) the NPR values look 
reasonable (no great surprise) but the absolute PMI values may be too small: 
old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2: 6.59488  
PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35

Your expectations are correct: the current RDKit implementation is wrong. The 
corresponding github entry is here: https://github.com/rdkit/rdkit/issues/1262
This is due to a mistake in the way the principal moments are calculated (which 
is due to the fact that I don't spend a lot of time working with/thinking about 
3D descriptors). Instead of using the eigenvectors/eigenvalues of the inertia 
matrix (the tensor of inertia) the RDKit is currently using the covariance 
matrix. There's some more on the relationship between these two here: 
http://number-none.com/blow/inertia/deriving_i.html

The problem is easy to fix (and I have something working here: 
https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws up the 
values of the descriptors that are derived from here:
Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of 
Chemoinformatics http://dx.doi.org/10.1002/9783527618279.ch37
These include the radius of gyration, inertial shape factor, etc.
Within that article they state that Ic = 0 for planar molecules. Ignoring the 
inequality on page 1010, which says that Ic is the largest moment and is 
contradicted by the rest of the text (particularly the inequalities on page 
1011), Ic corresponds to the smallest principal moment : PMI1.

So now I'm confused, but I'm hoping this is obvious to someone versed in the 
field: I'd like to reproduce the descriptors described in the Todeschini 
article, but I clearly can't do that using the actual moments of inertia. I 
could keep using the eigenvalues of the covariance matrix there, but that 
doesn't match what's described in the text.

Two things that would be extremely helpful:
1) an explanation of the disconnect here from someone who knows this stuff, I 
would guess that it's pretty simple
2) The results of running the files github1262_1.mol, github1262_2.mol, and 
github1262_3.mol from here: 
https://github.com/greglandrum/rdkit/tree/fix/github1262/Code/GraphMol/MolTransforms/test_data
 through Dragon and calculating the radius of gyration, inertial shape factor, 
eccentricity, molecular asphericity, and spherocity index.

Best,
-greg
**  
DISCLAIMER  
This email and any files 

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Peter Gedeck
According to this:
https://en.wikipedia.org/wiki/List_of_moments_of_inertia
The moments of inertia of a disk (something like benzene) are:

Iz = mr^2/2
Ix = Iy = mr^2/4

None of them is zero. The smallest moment of inertia of a rod-like molecule
(e.g. C#C) is zero.

Best,

Peter



On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum  wrote:

> Hi Guillaume,
>
> I think it this case it's something else. According to the Todeschini
> article the smallest moment of inertia of a planar molecule like benzene
> should be zero. The eigenvalues of the inertia matrix for benzene, however,
> are definitely not zero (and not close enough that it's likely to be
> round-off error).
> It would be very nice if you could run the three files I mention through
> Dragon and let me know what it calculates for those descriptors.
>
> -greg
>
>
> _
> From: Guillaume GODIN 
> Sent: Sunday, January 15, 2017 1:11 PM
> Subject: RE: [Rdkit-discuss] PMI API
> To: Greg Landrum , RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>, Chris Earnshaw  >
>
>
>
> Dear Greg,
>
>
> I  suspect that it's a precision error or eigen algorithm shift between
> rdkit c++ & dragon.
>
>
> To obtain good value, I suggest to try to implement a test on the eigen
> values like i did in gateway.cpp implementation.
>
>
>
> JacobiSVD getSVD(MatrixXd A) {
>
> JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);
>
> return mysvd;
>
> }
>
>
> // get the A-1 matrix using
>
> MatrixXd GetPinv(MatrixXd A){
>
> JacobiSVD svd = getSVD(A);
>
> double  pinvtoler=1.e-2;// choose your tolerance wisely!
>
> VectorXd vs=svd.singularValues();
>
> VectorXd vsinv=svd.singularValues();
>
>
> for (unsignedint i=0; i
> if ( vs(i) > pinvtoler )
>
>vsinv(i)=1.0/vs(i);
>
>else vsinv(i)=0.0;
>
> }
>
>
> MatrixXd S =  vsinv.asDiagonal();
>
> MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();
>
> return Ap;
>
> }
>
>
> If it's not solve the problem, I would like to test it in Matlab. can you
> provide me the 3 (3d xyz matrix) of your example please ?
>
>
> I also have Dragon 6
>
>
> best regards,
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Greg Landrum 
> *Envoyé :* dimanche 15 janvier 2017 11:50
> *À :* Chris Earnshaw; RDKit Discuss
> *Objet :* Re: [Rdkit-discuss] PMI API
>
> I managed to make some time to look into this this weekend and I've found
> a bug and something I don't understand. Hopefully the community can help
> out here.
> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
>
> 4) The big one! The returned results look very odd. They appear to relate
> more to the dimensions of the molecule than the moments of inertia. For a
> rod-like molecule (dimethylacetylene) I'd expect two large and one small
> PMI (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
> For disk-like (benzene) the result should be one large and two medium
> (e.g. PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2:
> 0.500013) but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1:
> 2.14213e-11  NPR2: 0.33.
> Finally for a roughly spherical molecule (neopentane) the NPR values look
> reasonable (no great surprise) but the absolute PMI values may be too
> small: old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2:
> 6.59488  PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>
>
> Your expectations are correct: the current RDKit implementation is wrong.
> The corresponding github entry is here:
> https://github.com/rdkit/rdkit/issues/1262
> This is due to a mistake in the way the principal moments are calculated
> (which is due to the fact that I don't spend a lot of time working
> with/thinking about 3D descriptors). Instead of using the
> eigenvectors/eigenvalues of the inertia matrix (the tensor of inertia) the
> RDKit is currently using the covariance matrix. There's some more on the
> relationship between these two here:
> http://number-none.com/blow/inertia/deriving_i.html
>
> The problem is easy to fix (and I have something working here:
> https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws
> up the values of the descriptors that are derived from here:
> Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of
> 

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Guillaume GODIN
No problem Greg,


reading carefully the Todeschini article, them said that Ic,Ib,Ia are determine 
as max & min values of I other all 3D axis passing throught the center of mass!


The "global PM" is never zero (sum of mi*ri*ri) idem for PMi even for planar 
molecule.


But When you have a planar molecule, the matrix is no more 3D but 2D! so it's 
normal to consider that the 3nd PM is zero.


BR,


Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Greg Landrum 
Envoyé : dimanche 15 janvier 2017 17:42
À : Guillaume GODIN; RDKit Discuss
Objet : Re: [Rdkit-discuss] PMI API

Thanks Guillaume!

On Sun, Jan 15, 2017 at 5:01 PM, Guillaume GODIN 
> wrote:

Here, Dragon results for the 3 molecules: I've included both  Whim and 3D 
descriptors but I don't have access to PMi!


I found the second document in agreement with Peter answer...


BR,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Peter Gedeck >
Envoyé : dimanche 15 janvier 2017 15:07
À : Greg Landrum; RDKit Discuss; Guillaume GODIN

Objet : Re: [Rdkit-discuss] PMI API

According to this:
https://en.wikipedia.org/wiki/List_of_moments_of_inertia
The moments of inertia of a disk (something like benzene) are:

Iz = mr^2/2
Ix = Iy = mr^2/4

None of them is zero. The smallest moment of inertia of a rod-like molecule 
(e.g. C#C) is zero.

Best,

Peter



On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum 
> wrote:
Hi Guillaume,

I think it this case it's something else. According to the Todeschini article 
the smallest moment of inertia of a planar molecule like benzene should be 
zero. The eigenvalues of the inertia matrix for benzene, however, are 
definitely not zero (and not close enough that it's likely to be round-off 
error).
It would be very nice if you could run the three files I mention through Dragon 
and let me know what it calculates for those descriptors.

-greg


_
From: Guillaume GODIN 
>
Sent: Sunday, January 15, 2017 1:11 PM
Subject: RE: [Rdkit-discuss] PMI API
To: Greg Landrum >, RDKit 
Discuss 
>,
 Chris Earnshaw >




Dear Greg,


I  suspect that it's a precision error or eigen algorithm shift between rdkit 
c++ & dragon.


To obtain good value, I suggest to try to implement a test on the eigen values 
like i did in gateway.cpp implementation.



JacobiSVD getSVD(MatrixXd A) {

JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);

return mysvd;

}


// get the A-1 matrix using

MatrixXd GetPinv(MatrixXd A){

JacobiSVD svd = getSVD(A);

double  pinvtoler=1.e-2;// choose your tolerance wisely!

VectorXd vs=svd.singularValues();

VectorXd vsinv=svd.singularValues();


for (unsignedint i=0; i pinvtoler )

   vsinv(i)=1.0/vs(i);

   else vsinv(i)=0.0;

}


MatrixXd S =  vsinv.asDiagonal();

MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();

return Ap;

}


If it's not solve the problem, I would like to test it in Matlab. can you 
provide me the 3 (3d xyz matrix) of your example please ?


I also have Dragon 6


best regards,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Greg Landrum >
Envoyé : dimanche 15 janvier 2017 11:50
À : Chris Earnshaw; RDKit Discuss
Objet : Re: [Rdkit-discuss] PMI API

I managed to make some time to look into this this weekend and I've found a bug 
and something I don't understand. Hopefully the community can help out here.
On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
4) The big one! The returned results look very odd. They appear to relate more 
to the dimensions of the molecule than the moments of inertia. For a