Quoting Rajarshi Guha <rajarshi.g...@gmail.com>:

>
> On Jan 26, 2010, at 11:30 AM, Vincent Le Guilloux wrote:
>
>> Dear cdk users,
>>
>> It seems that it's impossible to get other results than NaN values for
>> the following descriptors:
>>
>> Wgamma1.unity = NaN
>> Wgamma2.unity = NaN
>> Wgamma3.unity = NaN
>> WG.unity = NaN
>
>
> It's a known (but unfortunately undocumented) issue. I've seen these
> NaN's from time to time but haven't gotten round to investigating it.
> If I recall correctly the problem is in the determination of assymetric
> and symetric atoms (gamma descriptors)
>

Yes indeed :). I had a quick look at the source code, and saw that the  
problem arises in this loop:

// look for symmetric & asymmetric atoms for the gamma descriptor
for (int i = 0; i < 3; i++) {
     double ns = 0.0;
     double na = 0.0;
     for (int j = 0; j < ac.getAtomCount(); j++) {
         boolean foundmatch = false;
         for (int k = 0; k < ac.getAtomCount(); k++) {
             if (k == j) continue;
             if (scores[j][i] == -1 * scores[k][i]) {
                 ns++;
                 foundmatch = true;
                 break;
             }
         }
         if (!foundmatch) na++;
     }
     double n = (double) ac.getAtomCount();
     gamma[i] = -1.0 * ((ns / n) * Math.log(ns / n) / Math.log(2.0) +  
(na / n) * Math.log(1.0 / n) / Math.log(2.0));
     gamma[i] = 1.0 / (1.0 + gamma[i]);
}

The problem is that the number of symmetric atom ns is always 0. As a  
consequence, ns/n = 0 and Math.log(ns / n) = -Infinity, which leads to  
the NaN value.

I'm guessing that a default value is obviously needed when ns is 0,  
which would fix this issue.

However I think the algorithm is broken as ns should not always be 0  
as it is currently the case. I don't really know if the algorithm used  
is theorically OK to detect symmetric atoms, but I think that in any  
case, comparing two double values extracted from PCA computation isn't  
a good idea due floating point imprecision (here: scores[j][i] == -1 *  
scores[k][i]). But its just a guess... If I take the benzene as  
example, here is the scores compared to each others, for the 6 carbon  
atoms:

-0.0010549268112963923
  1.074278584188431E-4
-4.564022550040472E-4
  4.717739512629139E-4
  7.722618841847252E-4
-0.0011905526719529509

Note that hydrogens are also compared in the algorithm.

Also, just a last remark and I stop bothering you: why don't you just  
send a warning or something like that, instead of an exception, when  
2D coordinates are detected? I ask this because, if I build a benzene  
in Marvin, and if I calculate 3D coordinates (still in marvin), the  
coordinates will still look 2D as the benzene is planar... And I will  
not be able to calculate any of the 3D descriptor from the CDK.

Anyway, thanks for your answer :)
vincent



------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to