==================================================================
The gateway between this list and the sci.stat.edu newsgroup will
be disabled on June 9.  This list will be discontinued on June 21.
Subscribe to the new list EDSTAT-L at Penn State using the web
interface at http://lists.psu.edu/archives/edstat-l.html.
==================================================================
.
In article <[EMAIL PROTECTED]>,
Rajarshi Guha  <[EMAIL PROTECTED]> wrote:
>On Thu, 17 Jun 2004 23:53:43 -0400, Richard Ulrich wrote:

>> On Thu, 17 Jun 2004 16:13:30 -0400, Rajarshi Guha
>> <[EMAIL PROTECTED]> wrote:

>>> Hi,
>>>   I'm not sure if this is the correct place to post this, so if not I'd
>>> appreciate pointers to where I could.

>>> When building models (say, regression or neural network) we need to choose
>>> a set of 'information rich' independent variables. 

>>> Is there any literature related to this topic?


>> "Information theory" has a serious concern, but I think you 
>> are after something different than Shannon's Information.
>> A set of dichotomies hold most possible information if they
>> are  0.50, which is also where the Variance is most.

>Thanks for the pointers.
>Actually I was indeed thinking in terms of an information theory approach.
>But then the question arises: if a variable is deemed as information rich
>(in terms of information theory) does that make it useful for a
>statistical model or does the information theory idea imply something else
>(i.e., is not directly applicable in the context of statistical models)

There are two types of information present here; one is the
amount of information available in the observations of the
random variables, essentially the number of bits required
for a perfect random device to produce the data, and the
other is the amount of information available from the data
to make decisions about the underlying parameters.

These are very much different.  To give an example, consider
a multinomial random variable with the p_i all non-zero.
Then the expected number of bits needed for a random device
which knows all the p_i to produce a sample of size n is at
least n*S, where S is the Shannon information, and less than
2 more than that.  On the other hand, there is a sufficient
statistic, namely, the number n_i of times i has occurred.
The Shannon information in the n_i is log_2(n)*(j-1)/2 + O(1),
where j is the number of classes; the remaining Shannon
information is in the order of the observations given n_i,
which is useless for inference.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
[EMAIL PROTECTED]         Phone: (765)494-6054   FAX: (765)494-0558

Reply via email to