Quoting Rajarshi Guha <rajarshi.g...@gmail.com>: > > On Jan 26, 2010, at 11:30 AM, Vincent Le Guilloux wrote: > >> Dear cdk users, >> >> It seems that it's impossible to get other results than NaN values for >> the following descriptors: >> >> Wgamma1.unity = NaN >> Wgamma2.unity = NaN >> Wgamma3.unity = NaN >> WG.unity = NaN > > > It's a known (but unfortunately undocumented) issue. I've seen these > NaN's from time to time but haven't gotten round to investigating it. > If I recall correctly the problem is in the determination of assymetric > and symetric atoms (gamma descriptors) >
Yes indeed :). I had a quick look at the source code, and saw that the problem arises in this loop: // look for symmetric & asymmetric atoms for the gamma descriptor for (int i = 0; i < 3; i++) { double ns = 0.0; double na = 0.0; for (int j = 0; j < ac.getAtomCount(); j++) { boolean foundmatch = false; for (int k = 0; k < ac.getAtomCount(); k++) { if (k == j) continue; if (scores[j][i] == -1 * scores[k][i]) { ns++; foundmatch = true; break; } } if (!foundmatch) na++; } double n = (double) ac.getAtomCount(); gamma[i] = -1.0 * ((ns / n) * Math.log(ns / n) / Math.log(2.0) + (na / n) * Math.log(1.0 / n) / Math.log(2.0)); gamma[i] = 1.0 / (1.0 + gamma[i]); } The problem is that the number of symmetric atom ns is always 0. As a consequence, ns/n = 0 and Math.log(ns / n) = -Infinity, which leads to the NaN value. I'm guessing that a default value is obviously needed when ns is 0, which would fix this issue. However I think the algorithm is broken as ns should not always be 0 as it is currently the case. I don't really know if the algorithm used is theorically OK to detect symmetric atoms, but I think that in any case, comparing two double values extracted from PCA computation isn't a good idea due floating point imprecision (here: scores[j][i] == -1 * scores[k][i]). But its just a guess... If I take the benzene as example, here is the scores compared to each others, for the 6 carbon atoms: -0.0010549268112963923 1.074278584188431E-4 -4.564022550040472E-4 4.717739512629139E-4 7.722618841847252E-4 -0.0011905526719529509 Note that hydrogens are also compared in the algorithm. Also, just a last remark and I stop bothering you: why don't you just send a warning or something like that, instead of an exception, when 2D coordinates are detected? I ask this because, if I build a benzene in Marvin, and if I calculate 3D coordinates (still in marvin), the coordinates will still look 2D as the benzene is planar... And I will not be able to calculate any of the 3D descriptor from the CDK. Anyway, thanks for your answer :) vincent ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user