Is there a problem with this kind of comparison?

Lucas Wells Thu, 03 Jan 2002 21:51:39 -0800

Hello All,

Let me apologise in advance -- I assume I am about to raise a very
simple / silly topic, but I don't have a background in Stats, so it's
hard for me to tell a sensible question from one that isn't.


Let me set the scene for my question: I work in a large business that
loves to circulate reports (which large business doesn't?). I have
encountered the situation below more times than I can recall, and each
time I see it I feel vaguely uneasy about it. I don't have the
background to say, "that's an error" or to say "nope, what they're
doing makes sense", so I thought I'd ask in here.

Okay, now for a fictional example... Let's say I receive a report that
contains some basic analysis of errors encountered on orders issued by
a workforce:

Let's say in August 2001 there were 10,000 orders issued and 2000 had
errors. 750 of the errors were in the name field, 700 of them were in
the address field and 550 were in the products ordered field. (note:
in a real example, an order would possibly have an error in more than
1 field).

In September 2001 there were 9000 orders issued, 2500 of which had
errors. Of the orders with errors, 1000 were with the name field, 950
were the address field and 550 were in the products ordered field.

In October 2001 there were 9500 orders issued, 2250 of which had
errors. Of the orders with errors, 1100 were in the name field, 700
were in the address field, and 450 were in the products ordered field.

So, what I often see, then is:

Orders (note: presented as Aug, Sep, Oct):

Orders Issue: 10000, 9000, 9500
Orders With Errors: 2000, 2500, 2250
% Orders With Errors: 20%, 27.78%, 23.68%

Fields With Errors:

Name Field: 750, 1000, 1100
Address Field: 750, 900, 700
Products Ordered: 500, 550, 450

% Fields With Errors (ie Number of Field Errors Divided By Total
Errors for that month)

Name Field: 37.50%, 40%, 48.89%
Address Field: 37.50%, 38%, 31.11%
Products Ordered: 25%, 22%, 20%

Now, here's what makes me uneasy -- usually where I see a table
containing percentages of a total over a period of time, as in '%
Fields With Errors' straight above, I will also see a graph with all
three data elements plotted (ie Name Field, Address Field, Products
Ordered) across the period examined (ie Aug, Sep, Oct) with some
commentary like:

"We can see from Sep to Oct that the percentage of errors in the Name
Field increased, however we managed to decrease the percentage of
errors in the Products Ordered field."

Now, I look at these percentages and I think to myself, 'They're
percentages of a whole. If one goes up, then another must fall. It
doesn't seem to make sense to examine them as if they are measures
that can be seperately influenced (ie, as if we could decrease
percentages across the board).'

Is this a legitimate concern?

I could understand it, if one type of error was more 'important' than
another, then perhaps you would be trying to minimise the percentage
of that particular error, but you would expect the others to inflate
as a result, yes?

Note: I know this is a very long-winded post, but I was having trouble
verbalising my concern with brevity.

Also: does anyone know of a book / web site that points out common
business statistical analysis errors?

Any help appreciated!

Many thanks,

LW


=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Is there a problem with this kind of comparison?

Reply via email to