> My point is your rating was based on an assumption that was totally > incorrect: that the developer had made a reasonable effort to put the > right > gauges and levers in the right place. Do you make a similar assumption > about > the FDM? That it is approximately right? Is there much value in such a > rating?
Vivian, I am sorry if I'm now taking a little more of a lecturing attitude - I do not know how much you know about mathematical statistics, but I have the impression you are completely missing the issue here. What the rating represents is a screening procedure. A screening procedure is used to quickly assess a large number of something, to single out a subset with given properties. For instance, you might screen a population for breast cancer. Screening procedures are designed to process large numbers, i.e. they do not make use of all available diagnostic tools and replace detailed knowledge by plausibility, because usually applying detailed knowledge and detailed testing requires time and resources which are not available (a detailed cancer test requires you to be hospitalized for maybe 1-2 days, say that (optimistically) costs 200$, to do it for 100 Million people once per year is 20 Billion per year (hm...)- so maybe you'd rather test less accurately for 5$ per person). Screenings therefore often test proxies, rather than the real property you're interested in. For any given instance of the something, it is always true that a detailed test has more accurate results. It is also true that a screening produces both false positives (i.e. assigns a property to something which does in fact not have that property) and false negatives (i.e. does not assign the property to something which does in fact have it). It is not required (nor reasonable to require) that a screening procedure is always correct or that the plausibility assumptions underlying it are always fulfilled. What is required is that the screening procedure is right most of the time (dependent on the problem, you want to minimize the rate of false positives, of false negatives or both - in the cancer example, it it better to send a few more people to detailed testing than to miss too many real cancer cases, so you try to minimize the false negatives). So, what you have shown with the KC-135 is a case in which a default assumption was wrong, but in which the scheme still (for whatever reason) gave a good answer. That's not very problematic (one wouldn't consider it problematic if a screening test picks up a cancer for the wrong reason if there in fact is a cancer). Right now you have shown me one example in which the default assumption does not work. If there are no more, it means it has an accuracy of 99.75%. If you can find as many as 40 planes with a similar history in which the designer did not care about cockpit layout, the default assumption would still have an accuracy of 90%. That's pretty good to me - and the chance that the default assumption does not work but the result is still reasonable is even better than that! The Concorde is in some sense way more problematic, because it is actually a 'wrong' result - a false negative (i.e. a high-quality plane gets a low rating). But here precisely the same question arises - what is the rate of false negatives? What is the actual probability that this happens to a second plane in the sample? Of course I don't factually know that (because I have no detailed test data for all aircraft), but I can give an estimate based on the sub-sample of planes I know better - this is where statistics comes in (I could even compute error margins for that estimate, although I have not done that yet). And that estimate suggests that the rate of false positives and negatives is low (about 2.5% for a deviation of 5 points between quality and visuals - which means that it works better than that 97.5% of the time). Again, this is a number which I consider entirely reasonable. It doesn't matter if the rating works in every instance perfectly, or if the assumptions capture every instance correctly. On average, the results are reasonable and they give you an overview. Having an overview picture of something with a 10% error margin is better than having no overview at all with 1% error margin (screening 90% of a population for cancer with a 10% rate of false positives and negatives is way more effective than testing 1% of the population in detail with a 1% failure rate). *shrugs* Codify any testing scheme you like, and I bet I can construct a case which is somehow not adequately treated in it. It doesn't matter that I can do that - it's the rate with which it actually happens that matters, and the amount of resources it takes to run the scheme. Hope that helps a bit, * Thorsten ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel