Dr. Burrill has some good ideas. I will try to remember them.

Be sure that the same conventions are used both times that the data is
entered.  If the same person does both entries, (undesirable but
sometimes unavoidable), be sure that both entries for the same case are
not done sequentially.

If you didn't have control over the instrument design, be sure to add a
variable for "question seen but none checked".

In my experience if we used something in the cell that is not a simple
flag it was less confusing and required less cognitive processing to use
the suffix of the variable name as the cell entry.  (In SPSS it is easy
enough to recode to a dichotomous flag.)  So if the variable is q8_1 to
q8_9 use 1 to 9 in the value.  If it is q8a to to q8h, it has been
easier to use a to h as the value.

You can keep the set visible simultaneously by pinning the id info and
sliding over to the relevant variables.

Hope this helps.

Art
[EMAIL PROTECTED]
Social Research Consultants
University Park, MD  USA
(301) 864-5570

Donald Burrill wrote:

OK.  Here's a way to do it, essentially the same as the method Art
Kendall was describing to you, but with some error-detection devices
built in.  Suppose Question 8 has 11 categories.  Then you need 11
variables to record them.  The combinations will then occur naturally,
you won't have to figure them out in advance.  I'll call the 11
variables Q8a, Q8b, ..., Q8k ("Q8" for "Question 8", "a" through "k" for
the possible responses "1" through "11").  Here's what several persons'
data on Question 8 might look like (I'm ignoring their data on all the
other questions, for lack of space).  Oh, and display this in a
monospaced font like Courier, else things won't line up vertically and
you'll have great difficulty perceiving what I want to show.  "ID" is
the person's ID, for which I'm arbitrarily using 3 digits;  you might
want to use fewer, or more, depending on your identification coding
scheme.

 ID ...  Q8a Q8b Q8c Q8d Q8e Q8f Q8g Q8h Q8i Q8j Q8k
 113      1   0   3   4   0   0   7   8   9   1   2
 114      0   0   3   0   0   0   7   0   9   0   2
 115      0   2   3   4   0   6   0   8   0   1   0
 116      0   0   0   0   5   0   0   0   9   1   0
 ... and so on

 Person # 113 checked categories 1, 3, 4, 7, 8, 9, 10, and 11.
 Person # 114 checked categories 3, 7, 9, and 11.
 Person # 115 checked categories 2, 3, 4, 6, 8, and 10.
 Person # 116 checked categories 5, 9, and 10.

I've entered "0" for the categories not checked.  In practice, I would
usually enter these as blanks (closer to what the questionnaire sheet
looks like, and easier for me to type), and convert the blanks to zeros
in the recoding section of my statistical program (MINITAB, SPSS, SAS,
or what-have-you).  But whatever system you use for data input may or
may not conveniently permit blanks;  in which case use zeroes.  The
fictitious data above look like this with blanks instead of zeroes:

 ID ...  Q8a Q8b Q8c Q8d Q8e Q8f Q8g Q8h Q8i Q8j Q8k
 113      1       3   4           7   8   9   1   2
 114              3               7       9       2
 115          2   3   4       6       8       1
 116                      5               9   1
 ... and so on

I deliberately do not enter "1" for every category, but only for
category 1 (or category 10), because now I can tell if I've erred in
entering data.  If a person checked category 5, there should be a "5"
for that person in variable Q8e.  If I find a "5" in one of the other
variables (Q8d or Q8f, for instance) I know that an error was made in
data entry.  Or if I find a "4" or a "6" or something else in variable
Q8e, I know there's an error.  I don't know whether the error was in
recording "4" when it should have been "5", or traveling one space too
far when the data-transcribing person (me?) was actually aiming for
variable Q8d, but I can consult the questionnaire answer sheets to find
that out.  And there won't be all that many errors, but there almost
certainly will be a few.  This device helps immensely in the
data-cleaning stage of one's work.

Notice also that just by printing variables Q8a through Q8k side by
side, you get an 11-digit number that displays, as clearly as one could,
which categories were checked by which persons.  (Of course, you have to
interpret the values "1" and "2" in variables Q8j and Q8k as "10" and
"11" respectively, but that's a small thing.  No need to enter more than
one digit for any response, and the "1" in variable Q8j is far enough
from variable Q8a that one is unlikely to be confusing them.)

This also makes interpreting things a little clearer, e.g. you may want
to ask for a frequency count:  how many persons checked "5", and so on.
The output will tell you that there are 74 "5"s and 83 "0"s (or however
many);  and if it tells you that there is also one "6", aha! you've
found another error!.

For other statistical purposes than merely displaying the data and their
counts, you may want the value "1" in each of these variables (or in
corresponding variables, perhaps in another data file copied from this
one). Here's one way to do that:
 1.  First convert all the blanks (if that's how the non-selected
categories were entered) to zeroes.  (Use the recoding utility in your
statistical package for this.)
 2.  Define and store away in some convenient place a very small number,
which I'll call EPS (short for "epsilon", but it's not important what
you call it).  The number needs to be small enough that when it is added
to an integer (say, "1") the computer cannot distinguish between the sum
and the integer.  I used to find EPS = 1.0E-25 useful (that's
computerese for "1 times 10 to the minus 25th power", or  0.000...001
where there are 24 zeroes between the decimal point and "1").
 3.  For each variable, compute (e.g.) QQ8a = Q8a + EPS.
 4.  For each variable, compute  QQ8a = Q8a/QQ8a
 Now you will have either 0 or 1 in the variable named QQ8a:  zero if
Q8a was zero to begin with, 1 if it was anything else.  And you've
neatly avoided getting into the messy troubles that arise when one tries
to divide by zero, because the divisor (QQ8a) is either equal to the
dividend (Q8a) (so the quotient is 1), or the divisor is 1.0E-25 and the
dividend is 0 (so the quotient is 0).
 And you can conveniently do things like add QQ8a through QQ8k to find
out how many categories were checked by each person.

This is probably approaching information overload, so I'll stop.  Go try
this on several persons' worth of several questions, creating a small
data file, and try running it through your statistical package for
frequency counts and simple data displays.  Anything that isn't clear
will make itself known in short order.
  Good luck!   -- Don Burrill.

On Sun, 14 Mar 2004, Rowney wrote:


Hi,
thanks for your reply, from the information i think because my
question is of the type "check all that apply" i would have to use
multiple dichotomies.  from my understanding then what i would do is
if someone had ticked box 4 and 6 i would create a 'dummy' variable in
the "value" box.

the problem which i foresee with this is that in my questionnaire
there all 11 categories, and people can tick up to as many as they
feel appropriate (and any combination), and from glancing through the
questionnaires i feel that it would be unfeasible to enter all the
combinations!

is there anything way to avoid this, or have i misunderstood how to do
this?

Thanks for your help, i really do appreciate it
Rachel


 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================


.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to