Re: AI-GEOSTATS: SIC2004: Automatic (one-click) mapping

2004-04-21 Thread Gerald Boogaart
Dear Gregoire, Dear List

I am not clear wether it is allowed to start a discussion on SIC2004 before it 
actually starts. Anyway I would like to promote discussion on the following 
point

In one sentence: Due to game theory, one of the worst blind algorithms will 
perform best in SIC2004.

The point is:
A fully automatic estimation algorithm has to obey the laws of game theory. 
Especially we have the classical problem of statistical optimality: 

Let L(A,P) discribe any negativ measure of fitness (The expecedt Loose in 
statistics) of an Algorithm A to cope with a truth with probability 
distribution P. Than in general it does not exist any algorithm A0 with 

L(A0,P) = L(A,P) forall A and P

That leads to the definition of an admissible estimator A0 in statistics which 
is given by 
It does not exist any A1 such that A1 is striktly better. 

Not Exists A such that forall P : L(A1,P) = L(A0,P) 

Comparing to admissible estimators for P in {P0,P1} leads following 
conclusion:
When L(A0,P0)L(A1,P=) is then   L(A0,P1)L(A1,P1)

Thus the estimator performing best with that one Problem will be probabily 
worse on others, because a simple/specific algorithm fit for the specific 
problem will perform best. The question is just, which simple/specific 
algorithm will win (because that will depend on the problem, since specific 
algorithms perform best on their own problem but worse on others). But what 
we need for a blind mapping is something totally different:

It should perform well for all P. (Or even better: Bail out with error 
message, when it not able to give good results)

This corresponds to the concept of minmax estimators which minimize the 
maximum L(A,P) or to Bayes Estimators minimizing the the mean of L(A,P) over 
all expected P s. 

However for any Minimax estimator typically for any fixed P a better algorithm
exists. And because many alorithms are in the test, but only one problem is in 
the test, we will see one of the naive ones to perform best.

As an example compare to algorithms using oridinary kriging with Alg1: a 
linear variogram, Alg2: a power variogram. 

If the data ist indeed obeying a linear variogram Alg1 is BLUE and Alg2 
estimates the a power near one and will be nearly BLUE. Alg1 won and Alg2 is 
slightly worse. 

If the data is obeying a spherical variogram, Alg2 performs better than, but 
will be outnumbered by  simple inverse square distances methods. 

However Alg2 was performing well in both cases. 

Thus I would propose to modify SIC2004 in the following way: 
Give multiple problems.

Hoping for nice discussion,
Gerald





On Wednesday 14 April 2004 11:01, Gregoire Dubois wrote:
 Good day everyone!

 Time is ripe for a new SIC (Spatial Interpolation Comparison) exercise !

 The second edition of SIC (SIC2004
 http://www.ai-geostats.org/events/sic2004.htm ) will be launched by
 the end of this month. The topic of this year will be automatic
 mapping, that is the use of algorithms for spatial interpolation that
 will not require any intervention or decision from the users. Hence the
 expression one-click mapping. Such algorithms would be obviously more
 than useful in the frame of environmental monitoring networks (e.g.
 automatic mapping of ozone levels in cities, radioactivity in the
 environment, etc.).  However, SIC97 has shown that it was very difficult
 to generate good results if one is not using the information provided by
 the spatial correlation (i.e. semivariograms). Can we today blindly use
 functions for the automatic fitting of semivariograms? Can machine
 learning algorithms compete with geostatistical functions?

 As for SIC97 (see  http://www.ai-geostats.org/events/sic97.htm
 http://www.ai-geostats.org/events/sic97.htm ), participants to SIC2004
 will receive a subset of an environmental data set (typically
 measurements of an environmental variable + spatial coordinates of the
 sampling places) and will have to estimate the values taken by the
 variable at the remaining locations of the full data set. The true
 values found at these locations will be made public only at the end of
 the exercise. Various criteria will be used to assess the performances
 of the interpolation algorithms (time of calculation, minimum errors,
 etc.).

 Because everything should be automatic, participants to SIC2004 will
 have to prepare their algorithms before receiving the data: only
 sampling locations will be given and no interaction with the algorithm
 will be allowed during the exercise. No worry, participants will have
 from the end of this month until the 15th of September to setup their
 functions.

 Participants to SIC2004 will be invited at the end of the exercise to
 submit a manuscript for publication in the online journal GIDA
 (Geographic Information and Decision Analysis). A selected number of
 papers will be published in a book (a European Report hardcopy) with
 some unpublished material provided by the editorial board.

 For more information, please visit 

RE: AI-GEOSTATS: SIC2004: Automatic (one-click) mapping

2004-04-21 Thread Gregoire Dubois
Hi Gerald, hello everyone,

I am not clear wether it is allowed to start a discussion on SIC2004
before it 
actually starts.

Any comment, discussion about automatic mapping are of course more than
welcome!! 

Actually they are more than urgent since I want to respect all deadlines
and we have only a few days left.
 
Discussions about automatic mapping would potentially intererest anyone
working in spatial statistics and SIC2004 would certainly benefit from
it. Discussions about the management and organisation of SIC2004 should
be sent directly to me, not to the list .

If you want to contribute to SIC2004 without playing the best
estimation game, please do not hesitate to submit a manuscript that
would be published in the hardcopy version of SIC2004 (after reviewing
of course).

One thing only so far is sure: SIC2004 will be about automatic mapping
of daily measurements of a variable X (no revelation at this point) and
some prior information will be given to tune all the paramaters in
advance.

We still have to define what prior information to give exactly. A
histogram, other data (subsets of the whole dataset, basic statistics,
repeated measurements of X at the same locations but for other days,
...?)

This is where we are now.

Best wishes and thanks in advance for any feedback,

Gregoire






-Original Message-
From: Gerald Boogaart [mailto:[EMAIL PROTECTED] 
Sent: 21 April 2004 11:47
To: Gregoire Dubois; [EMAIL PROTECTED]
Subject: Re: AI-GEOSTATS: SIC2004: Automatic (one-click) mapping


Dear Gregoire, Dear List

I am not clear wether it is allowed to start a discussion on SIC2004
before it 
actually starts. Anyway I would like to promote discussion on the
following 
point

In one sentence: Due to game theory, one of the worst blind algorithms
will 
perform best in SIC2004.

The point is:
A fully automatic estimation algorithm has to obey the laws of game
theory. 
Especially we have the classical problem of statistical optimality: 

Let L(A,P) discribe any negativ measure of fitness (The expecedt Loose
in 
statistics) of an Algorithm A to cope with a truth with probability 
distribution P. Than in general it does not exist any algorithm A0 with 

L(A0,P) = L(A,P) forall A and P

That leads to the definition of an admissible estimator A0 in statistics
which 
is given by 
It does not exist any A1 such that A1 is striktly better. 

Not Exists A such that forall P : L(A1,P) = L(A0,P) 

Comparing to admissible estimators for P in {P0,P1} leads following 
conclusion:
When L(A0,P0)L(A1,P=) is then   L(A0,P1)L(A1,P1)

Thus the estimator performing best with that one Problem will be
probabily 
worse on others, because a simple/specific algorithm fit for the
specific 
problem will perform best. The question is just, which simple/specific 
algorithm will win (because that will depend on the problem, since
specific 
algorithms perform best on their own problem but worse on others). But
what 
we need for a blind mapping is something totally different:

It should perform well for all P. (Or even better: Bail out with error 
message, when it not able to give good results)

This corresponds to the concept of minmax estimators which minimize the 
maximum L(A,P) or to Bayes Estimators minimizing the the mean of L(A,P)
over 
all expected P s. 

However for any Minimax estimator typically for any fixed P a better
algorithm exists. And because many alorithms are in the test, but only
one problem is in 
the test, we will see one of the naive ones to perform best.

As an example compare to algorithms using oridinary kriging with Alg1: a

linear variogram, Alg2: a power variogram. 

If the data ist indeed obeying a linear variogram Alg1 is BLUE and Alg2 
estimates the a power near one and will be nearly BLUE. Alg1 won and
Alg2 is 
slightly worse. 

If the data is obeying a spherical variogram, Alg2 performs better than,
but 
will be outnumbered by  simple inverse square distances methods. 

However Alg2 was performing well in both cases. 

Thus I would propose to modify SIC2004 in the following way: 
Give multiple problems.

Hoping for nice discussion,
Gerald





On Wednesday 14 April 2004 11:01, Gregoire Dubois wrote:
 Good day everyone!

 Time is ripe for a new SIC (Spatial Interpolation Comparison) exercise

 !

 The second edition of SIC (SIC2004 
 http://www.ai-geostats.org/events/sic2004.htm ) will be launched by 
 the end of this month. The topic of this year will be automatic 
 mapping, that is the use of algorithms for spatial interpolation that

 will not require any intervention or decision from the users. Hence 
 the expression one-click mapping. Such algorithms would be obviously

 more than useful in the frame of environmental monitoring networks 
 (e.g. automatic mapping of ozone levels in cities, radioactivity in 
 the environment, etc.).  However, SIC97 has shown that it was very 
 difficult to generate good results if one is not using the information

 provided by the spatial correlation (i.e