[EMAIL PROTECTED] (Serge) wrote in news:[EMAIL PROTECTED]:
> Hello, > > I have to do a final project for my Statistics class. I chose to do a > project on the relationship between a car's weight and its 4 crash > test ratings (2 front for passenger and driver and 2 side for front > and rear). The ratings are 1 - 5 (the number of stars); the higher the > rating, the safer the car. I got all my data from crashtest.com. In > all, I got data for 187 cars - that is, the weight and the 4 ratings. > However, when > I did a scatterplot of any 1 of the front ratings vs. weight the r > coefficient was very close to 0 and the plot did not show a pattern. > When I plotted the side ratings vs. weight, r was about .6 in both > cases (a mild relationship). Thus, I can't really conclude anything > from this. What I am thinking of doing and what I am asking about is > this: If I separate my data into 9 groups based on weight where each > group spans 500 pounds so that group1 has cars with weights 1501 - > 2000 and group > 9 has 5501 - 6000 and then take the mean of all 4 ratings in each of > these groups and plot them against the mean weight in each group for > all 4 ratings (again, 4 separate plots), will I get a more definite > relationship (I haven't done it yet so I don't know) and even more > importantly, will this apparent relationship be statistically > significant and why or why not (why does grouping help or why not)? If > not, can you suggest something else that I may be able to do. Thanks > so much for your help. I think I see where you want to go, but what you're proposing can't take you there. Breaking a continuous variable into categories based only on its actual value can only lose information, not add it. Any apparent relationships that pop up would be merely artifacts of your choice of cut- points (this is known colloquially as "torturing the data until it confesses." The problem is that it's very easy to get a false confession). What you need to do is *increase* the amount of information available to you. You could, for example, break the cars into categories based on their industry-standard classifications (compact sedans, sports cars, SUVs, etc.) and then examine the relationships between weight and rating both within- groups and between-groups. The point is that grouping can increase your information if it's based on subject-matter knowledge, but not if it's based only on the numerical properties of your existing data. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
