Life is way too complicated. advice needed on dealing with a complicated dataset

Harris, Betty A Thu, 19 Sep 2002 07:49:32 -0700

Hi all, 

I'm trying to decide how to approach analyzing a new dataset and thought
this might be a good place to ask for advice.


We tested K-3 students from demographically similar high poverty/low
performing schools at fall 2001 and again in spring 2002 on their reading
skills. In the "treatment" schools the teachers are participating in a
technical assistance program designed to make them better at teaching
reading.  The comparison schools were demographically similar schools where
teachers are not implementing the TA reading program.  The goal is to see if
the TA program produced larger gains in student reading skills at the end of
the school year.  Most of our dependent measures are interval level data.
The sample size is around 200 kids per group.

The design is repeated measures - time 1 and time 2 
and a between groups factor - treatment vs. comparison. 

My dilemma is that when I look at grade by group student demographics, I'm 
finding statistically significant differences in student demographics 
between the two groups of students  -bummer.  
I have a larger proportion of Hispanic/limited English proficiency kids in
the treatment group. 
-and- 
those demographics are correlated with the dependent variables. 
-and- 
what's really rude is kids with some of those demographics show more gain on
the lower level reading skills, but show less gain on the more complex
reading skills.
-and- 
to top off the dilemma, the demographics are correlated with each 
other -like I needed more complexity. 
for example... the proportion of Hispanic kids is highly correlated with the
proportion of limited English proficiency (LEP) kids because about 2/3s of
the Hispanic kids are also LEP (only about 1% of non-Hispanic kids are LEP).
The other demographics that are different are dichotomous variables that
represent student race (e.g. white or not, black or not, etc) and special
education status.

To get a little more concrete,  the kindergarten treatment group has 
proportionally 
-14% more Hispanics (38% vs. 24%) 
-9% more LEP  (25% vs. 16%) 
-14% fewer whites (50% vs. 64%) 
-16% more "Other" race (38% vs. 22%)  This difference in an artifact of
using separate race and ethnicity questions...99% of the kids who were coded
"other race" were coded ethnicity=Hispanic. 

For each grade level, I'm thinking of doing a 2x2 BxW ANCOVA and including
the demographics that are significantly different between the two groups as
covariates.  I think I can safely assume that ethnicity and race=other are
telling me the same thing and exclude race=other as a covariate.  Given the
issues above (especially the LEP/hispanic correlation) what do you think the
best way to proceed is?

Also,  any ideas how I can deal with the above issues with my categorical
dependent measures other than doing separate analyses (e.g. chi square) on
groups disaggregated based on student demographics? 

Out for now, 
Betty 

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Life is way too complicated. advice needed on dealing with a complicated dataset

Reply via email to