RE: clusters within a sample

Simon, Steve, PhD Tue, 08 Jan 2002 08:04:33 -0800

Yvonne Unrau writes:

>I am working with a large administrative data (N=1,086)

>set for a foster care agency. In short, I am comparing

>client outcomes across two branches (each is delivering

>a different service model). For analyses, I am using

>logistic regression (SPSS) where my dependent

>variables include a variety of outcomes measuring

> program success vs. failure. My test variable is the

>program (two groups), plus I have several other

>demographic and service related variables.

>My problem is that I have two types of "clusters" of

>children in my data set:

>siblings from the same biological family (may or may

>not be placed in the same foster home)

>foster children placed in one foster home (may or may

>not be siblings)

>I am looking for ways to test the amount of error

>associated with the above clusters using SPSS. My

>strategy to date has been to SELECT the restricted

>sample, run the LR analysis, then eyeball the results.

>What are my other options?

Wow! A messy data set. What fun!

First thing you should do is to get a handle on the size of your clusters. Are they often just one child and only rarely do the clusters tend to be two or more children? Or is it the opposite case, where almost every cluster has two or more children in it.

If most of your data is just one child in each cluster, then it may make sense to lower your expectations. A binary dependent variable gives you relatively little information about variability (at least compared to a continuous variable) and you may be trying to estimate something without enough data to get any reasonable estimates.

Second, you need to understand how the data behaves at a higher level. Create an aggregate variable across all members of the cluster and then model that aggregate variable. This is tricky, and you may have to use a model which assumes nice normal residuals when your data is clearly non-normal. That's okay, because you are just trying to get a starting point for a more complex analysis.

Third, you need to abandon SPSS and use software that can model random effects in a logistic regression analysis. The beta-binomial model is the one that was first developed for this data, but other models have been used more recently. I think SAS and STATA can handle this type of analysis and there is probably other software as well.

Fourth, you need to estimate each cluster effect separately first. Estimate the sibling effect ignoring the foster family effect. If possible, randomly select only one member within each foster family and do the analysis with a random sibling effect. Reverse the process and estimate the foster family effect after randomly selecting only one sibling.

Fifth, see if you can estimate both effects simultaneously. This model is very complex and even software that can handle random effects in a logistic regression model may not be able to handle this.

You may want to become friends with someone in the Statistics Department at your university. This is a very tricky analysis.

Good luck!

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.

The STATS web page has moved to

http://www.childrens-mercy.org/stats

RE: clusters within a sample

Reply via email to