If you don't mind, I'm going to copy this back to the EDSTAT group.
Fahd wrote:
>
> Sorry for not being clear on that. The problem is very simple. We have a
> room with all sort of sensors (light, weight, software sensors...etc) and
> we have records of the activities people are doing along with readings
> from the sensors. We want to identify the correlations between the sensors
> and the activities, for example, if it is say a presentation then the
> projector sensor should show a high correlation with the activity. We are
> of course representing the activities with discrete values. Any thoughts
> about the implications of using regression.
OK - the activities are presumably not just discrete but dichotomous
(or categorical, if they are mutually exclusive.) You'd code each one as
1 or 0.
You might want to look at the possibility of using logistic regression
here. It's available on many packages including MINITAB.
Logistic regression, like most regression techniques including Ordinary
Least Squares, is not really about fitting a line to the data. it's
about fitting a line to a parameter value that (a) describes the
distribution of the dependent variable and (b) changes as a function of
the other variables. In the OLS case the parameter is the mean of the
observed values, which are assumed to have a normal distribution for
any particular set of independent variables. As the mean of a normal
distribution is also the mode (the point of greatest density), the data
cluster arond that line.
In the logistic case, the data are 1 or 0. The assumed distribution,
for any given set of independent variables, is Bernoulli [like the
mythical "weighted coin", it gives one of two outcomes with a fixed
probability not necessarily 1/2]. The parameter is this fixed
probability; and the curve that is fit describes not the location of
the data but the probability of getting one or the other outcome. So
(in an example I once saw demonstrated at a teaching workshop) the
experiment might involve shooting a basketball into a basket. At any
distance, you have a certain probability of sinking the shot. The goal
is - on the basis of experimental data - to infer the probability as a
function of distance.
Now, this isn't going to be a straight line. Among other things, a
straight line would give negative probabilities beyond a certain
distance & probabilities greater than 1 if very close. False Zen. So
logistic regression fits the best "logistic function" as the model for
the probability. Details are a bit gnarly, but this curve has 0 and 1 as
horizontal asymptotes & makes a smooth transition between them. Your
stats package will handle the details. Explanatory variables may be
discrete or
continuous.
=========================================P=1
*********
******
**
*
**
******
************
=========================================P=0
-RD
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================