Hello All, Let me apologise in advance -- I assume I am about to raise a very simple / silly topic, but I don't have a background in Stats, so it's hard for me to tell a sensible question from one that isn't.
Let me set the scene for my question: I work in a large business that loves to circulate reports (which large business doesn't?). I have encountered the situation below more times than I can recall, and each time I see it I feel vaguely uneasy about it. I don't have the background to say, "that's an error" or to say "nope, what they're doing makes sense", so I thought I'd ask in here. Okay, now for a fictional example... Let's say I receive a report that contains some basic analysis of errors encountered on orders issued by a workforce: Let's say in August 2001 there were 10,000 orders issued and 2000 had errors. 750 of the errors were in the name field, 700 of them were in the address field and 550 were in the products ordered field. (note: in a real example, an order would possibly have an error in more than 1 field). In September 2001 there were 9000 orders issued, 2500 of which had errors. Of the orders with errors, 1000 were with the name field, 950 were the address field and 550 were in the products ordered field. In October 2001 there were 9500 orders issued, 2250 of which had errors. Of the orders with errors, 1100 were in the name field, 700 were in the address field, and 450 were in the products ordered field. So, what I often see, then is: Orders (note: presented as Aug, Sep, Oct): Orders Issue: 10000, 9000, 9500 Orders With Errors: 2000, 2500, 2250 % Orders With Errors: 20%, 27.78%, 23.68% Fields With Errors: Name Field: 750, 1000, 1100 Address Field: 750, 900, 700 Products Ordered: 500, 550, 450 % Fields With Errors (ie Number of Field Errors Divided By Total Errors for that month) Name Field: 37.50%, 40%, 48.89% Address Field: 37.50%, 38%, 31.11% Products Ordered: 25%, 22%, 20% Now, here's what makes me uneasy -- usually where I see a table containing percentages of a total over a period of time, as in '% Fields With Errors' straight above, I will also see a graph with all three data elements plotted (ie Name Field, Address Field, Products Ordered) across the period examined (ie Aug, Sep, Oct) with some commentary like: "We can see from Sep to Oct that the percentage of errors in the Name Field increased, however we managed to decrease the percentage of errors in the Products Ordered field." Now, I look at these percentages and I think to myself, 'They're percentages of a whole. If one goes up, then another must fall. It doesn't seem to make sense to examine them as if they are measures that can be seperately influenced (ie, as if we could decrease percentages across the board).' Is this a legitimate concern? I could understand it, if one type of error was more 'important' than another, then perhaps you would be trying to minimise the percentage of that particular error, but you would expect the others to inflate as a result, yes? Note: I know this is a very long-winded post, but I was having trouble verbalising my concern with brevity. Also: does anyone know of a book / web site that points out common business statistical analysis errors? Any help appreciated! Many thanks, LW ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================