AI-GEOSTATS: Re: Log versus nscore transform

Isobel Clark Wed, 09 Aug 2006 08:11:50 -0700

Gregoire

Your question breaks into several parts. The following are in no particular order.

(o) the nscore transformation (as I understand it) transforms your data histogram into a Normal distribution. Questions arise which have been around since the first statistician proposed a 'distribution free' transformation: how exactly does your nscore transform deal with zeroes? Are nscores allocated randomly to the zeroes or by some sort of proximity weighting. If the former, you introduce an added random component into your data structure amongst the zeroes and any other repeated value. The same questions apply to rank/order transforms and to any other mechanical or numerical mapping from histogram to distribution (such as Hermitian polynomials). You also need declustered or truly random data to produce a valid nscore transformation.

(o) are your zeroes really zero or non-detect? If they are really zero, your observation is that the phenomenon of interest does not actually occur at that location. This implies that you have two 'populations' and should perhaps consider an indicator approach to separate the populations or map 'likelihood of occurrence' rather than consider the zeroes part of the same population. If the zeroes are 'below detection limit' then random nscore might be the best way to go. Or possibly a lognormal distribution with an additive constant.

(o) using a lognormal (or any other) distribution model implies that you believe your histogram is constructed from a limited amount of data from some ideal population distribution. Transforming data does not turn your histogram into an ideal Normal distribution, it turns it into a histogram of a set of samples from a Normal distribution. The advantage of this approach is that your transformation does not change when you add more sampling, as is the case with nscore and similar transforms.

(o) it is very simple to test whether your data is likely to come from a lognormal (or other) distribution. Distribution fitting and probability paper applications have been in use for over 50 years, and statistical tests such as the chi-quared and Kolmogorov-Smirnov almost as long. You also have the option to verify whether backtransforms such as the lognormal are appropriate, during the cross validation stage of your analysis. Probability plots also help you judge whether you have a multi-component (non-homogeneous) data set rather than one which is simply skewed.

(o) your choice depends on what you are using the results for. For simulation purposes, nscore will work as well as a distribution modelling approach, providing your original transform was carried out on declustered data. For any linear kriging method, you would have to work out a backtransformation which allows for the smaller variance of the kriged values. I am not familiar enough with the relevant software, to know if this is automatically done for nscore transformations. Simply reversing the nscore transform won't work. The parametric backtransform for lognormal kriging, for example, includes components to ensure that the backtransform produces unbiassed estimates in the original data space.

I guess the short answer to your question depends on whether your personal preference is statistical or computational. I was trained to consider the actual sample set as a guide to what the population looked like, not as an exact match. When lognormal transforms do not work on my projects, I turn to rank and indicator transforms which are distribution free. An nscore would be equivalent to a rank transform but resulting in a Normal rather than a uniform 'data' set. Again, the usefulness depends on whether you are modelling for interpretation, simulation or estimation.

Sorry to be so long-winded but it is a much more complex problem than at first glance. I hope someone out there will fill in the gaps left by my own inexperience with nscore transforms.

Isobel

http://www.kriging.com

Gregoire Dubois <[EMAIL PROTECTED]> wrote:

Dear list,

I am puzzled about the use of logarithmic and nscore transforms in geostatistics.

Given the apparent advantages in using nscore transforms over the logarithmic transform (nscore has no problem when dealing with 0 values and is "managing" the tails of the distribution very (more?) efficiently), why would one still want to use log-normal kriging? Because of the mathematical elegance of using a model only?

Moreover, one can frequently not be "sure" about the lognormality of the analysed dataset, so why would one still take the risk of using log-normal kriging?

Thank you in advance for any feedback on this issue.

Best regards,

Gregoire

__________________________________________
Gregoire Dubois (Ph.D.)

European Commission (EC)
Joint Research Centre Directorate (DG JRC)
Institute for Environment and Sustainability (IES)

TP 441, Via Fermi 1
21020 Ispra (VA)
ITALY

Tel. +39 (0)332 78 6360
Fax. +39 (0)332 78 5466
Email: [EMAIL PROTECTED]

WWW: http://www.ai-geostats.org
WWW: http://rem.jrc.cec.eu.int

"The views expressed are purely those of the writer and may not in any circumstances be regarded as stating an official position of the European Commission."

AI-GEOSTATS: Re: Log versus nscore transform

Reply via email to