Hi Byron; thanks very much for this. We threw it in front of a
statistician, and got this:
With the caveat that I'm not actually a biostatistician, I think year
is the wrong "treatment" here. You either want to
1. binarize before/after the major shift in instructor training
procedure (note this has major confounding issues with time, and
thus popularity and size of data carpentry, etc), or
2. compare all the actual training sessions. There are a lot of them,
which may destroy your power, but if you see some that are way
low, you can look at them and see if they were, e.g., all taught
by the same person, or have other characteristics in common.
To do this type of modelling I think I'd really want to have more
covariates to put in the model though, either at the training session
or trainee level. The more (true, relevant) information you give the
model the better it can answer your question, and it seems pretty
starved for info if you're JUST giving it year...
I can easily label sessions as "two-day" or "multi-week", which is the
major distinguishing characteristic. I don't think we'll get much
signal yet from labeling by instructor, since I taught or co-taught
everything before January, and we've only had 4 since then that were
solely taught by other people (a number I sincerely hope will go up).
But this is still pretty cool - I'll see if I can cook a better data set.
Cheers,
Greg
On 2016-05-23 4:41 PM, Byron Smith wrote:
Could someone take a look at this survival analysis of the same data
[1]? I'm by no means an expert, so I'd like to know if I'm doing
anything obviously wrong.
[1]: http://bsmith89.github.io/swc-instructor-training-analysis/
--
Dr Greg Wilson
Director of Instructor Training
Software Carpentry Foundation
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss