Be sure that the same conventions are used both times that the data is entered. If the same person does both entries, (undesirable but sometimes unavoidable), be sure that both entries for the same case are not done sequentially.
If you didn't have control over the instrument design, be sure to add a variable for "question seen but none checked".
In my experience if we used something in the cell that is not a simple flag it was less confusing and required less cognitive processing to use the suffix of the variable name as the cell entry. (In SPSS it is easy enough to recode to a dichotomous flag.) So if the variable is q8_1 to q8_9 use 1 to 9 in the value. If it is q8a to to q8h, it has been easier to use a to h as the value.
You can keep the set visible simultaneously by pinning the id info and sliding over to the relevant variables.
Hope this helps.
Art [EMAIL PROTECTED] Social Research Consultants University Park, MD USA (301) 864-5570
Donald Burrill wrote:
OK. Here's a way to do it, essentially the same as the method Art Kendall was describing to you, but with some error-detection devices built in. Suppose Question 8 has 11 categories. Then you need 11 variables to record them. The combinations will then occur naturally, you won't have to figure them out in advance. I'll call the 11 variables Q8a, Q8b, ..., Q8k ("Q8" for "Question 8", "a" through "k" for the possible responses "1" through "11"). Here's what several persons' data on Question 8 might look like (I'm ignoring their data on all the other questions, for lack of space). Oh, and display this in a monospaced font like Courier, else things won't line up vertically and you'll have great difficulty perceiving what I want to show. "ID" is the person's ID, for which I'm arbitrarily using 3 digits; you might want to use fewer, or more, depending on your identification coding scheme.
ID ... Q8a Q8b Q8c Q8d Q8e Q8f Q8g Q8h Q8i Q8j Q8k 113 1 0 3 4 0 0 7 8 9 1 2 114 0 0 3 0 0 0 7 0 9 0 2 115 0 2 3 4 0 6 0 8 0 1 0 116 0 0 0 0 5 0 0 0 9 1 0 ... and so on
Person # 113 checked categories 1, 3, 4, 7, 8, 9, 10, and 11. Person # 114 checked categories 3, 7, 9, and 11. Person # 115 checked categories 2, 3, 4, 6, 8, and 10. Person # 116 checked categories 5, 9, and 10.
I've entered "0" for the categories not checked. In practice, I would usually enter these as blanks (closer to what the questionnaire sheet looks like, and easier for me to type), and convert the blanks to zeros in the recoding section of my statistical program (MINITAB, SPSS, SAS, or what-have-you). But whatever system you use for data input may or may not conveniently permit blanks; in which case use zeroes. The fictitious data above look like this with blanks instead of zeroes:
ID ... Q8a Q8b Q8c Q8d Q8e Q8f Q8g Q8h Q8i Q8j Q8k 113 1 3 4 7 8 9 1 2 114 3 7 9 2 115 2 3 4 6 8 1 116 5 9 1 ... and so on
I deliberately do not enter "1" for every category, but only for category 1 (or category 10), because now I can tell if I've erred in entering data. If a person checked category 5, there should be a "5" for that person in variable Q8e. If I find a "5" in one of the other variables (Q8d or Q8f, for instance) I know that an error was made in data entry. Or if I find a "4" or a "6" or something else in variable Q8e, I know there's an error. I don't know whether the error was in recording "4" when it should have been "5", or traveling one space too far when the data-transcribing person (me?) was actually aiming for variable Q8d, but I can consult the questionnaire answer sheets to find that out. And there won't be all that many errors, but there almost certainly will be a few. This device helps immensely in the data-cleaning stage of one's work.
Notice also that just by printing variables Q8a through Q8k side by side, you get an 11-digit number that displays, as clearly as one could, which categories were checked by which persons. (Of course, you have to interpret the values "1" and "2" in variables Q8j and Q8k as "10" and "11" respectively, but that's a small thing. No need to enter more than one digit for any response, and the "1" in variable Q8j is far enough from variable Q8a that one is unlikely to be confusing them.)
This also makes interpreting things a little clearer, e.g. you may want to ask for a frequency count: how many persons checked "5", and so on. The output will tell you that there are 74 "5"s and 83 "0"s (or however many); and if it tells you that there is also one "6", aha! you've found another error!.
For other statistical purposes than merely displaying the data and their counts, you may want the value "1" in each of these variables (or in corresponding variables, perhaps in another data file copied from this one). Here's one way to do that: 1. First convert all the blanks (if that's how the non-selected categories were entered) to zeroes. (Use the recoding utility in your statistical package for this.) 2. Define and store away in some convenient place a very small number, which I'll call EPS (short for "epsilon", but it's not important what you call it). The number needs to be small enough that when it is added to an integer (say, "1") the computer cannot distinguish between the sum and the integer. I used to find EPS = 1.0E-25 useful (that's computerese for "1 times 10 to the minus 25th power", or 0.000...001 where there are 24 zeroes between the decimal point and "1"). 3. For each variable, compute (e.g.) QQ8a = Q8a + EPS. 4. For each variable, compute QQ8a = Q8a/QQ8a Now you will have either 0 or 1 in the variable named QQ8a: zero if Q8a was zero to begin with, 1 if it was anything else. And you've neatly avoided getting into the messy troubles that arise when one tries to divide by zero, because the divisor (QQ8a) is either equal to the dividend (Q8a) (so the quotient is 1), or the divisor is 1.0E-25 and the dividend is 0 (so the quotient is 0). And you can conveniently do things like add QQ8a through QQ8k to find out how many categories were checked by each person.
This is probably approaching information overload, so I'll stop. Go try this on several persons' worth of several questions, creating a small data file, and try running it through your statistical package for frequency counts and simple data displays. Anything that isn't clear will make itself known in short order. Good luck! -- Don Burrill.
On Sun, 14 Mar 2004, Rowney wrote:
Hi, thanks for your reply, from the information i think because my question is of the type "check all that apply" i would have to use multiple dichotomies. from my understanding then what i would do is if someone had ticked box 4 and 6 i would create a 'dummy' variable in the "value" box.
the problem which i foresee with this is that in my questionnaire there all 11 categories, and people can tick up to as many as they feel appropriate (and any combination), and from glancing through the questionnaires i feel that it would be unfeasible to enter all the combinations!
is there anything way to avoid this, or have i misunderstood how to do this?
Thanks for your help, i really do appreciate it Rachel
------------------------------------------------------------ Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
. . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================