> Nevertheless, I am not persuaded. Your rating is based on: "Four legs
> good, two legs bad!". While that may be generally true, it will throw up
> many anomalies, and the problem is you neither know which these are,
> nor how many, because you haven't and can't properly test your hypothesis.

First of all, I'm not making (and haven't made) any strong statements
about the accuracy of FDMs, because the number of planes for which I have
an idea what that number should be is small, and I think there is a
general consensus that to judge an FDM adequately is lots of work. My
statements about 'quality' and the correlation with 'beauty' are chiefly
based on modelling of systems, instrumentation and implemented procedures
- these I can judge better.

It is simply not true that I can't test my hypothesis with regard to
instrumentation - I have flown about 40 aircraft with some regularity
since installing Flightgear, I have taken a look at real cockpit
photographs for some of them, I have read their documentation and have
knowledge of what the different buttons do, so I have a fair idea about
how detailed their instrumentation is modelled. My hypothesis for fairly
detailed planes is tested on that subsample of 10% of the available
aircraft.

In addition, there are about 40+  aircraft for which the lack of
instrumentation and systems is fairly obvious (i.e. I see no gauge in the
cockpit...) even without spending a longer time in the aircraft. For these
I likewise claim knowledge of the quality of systems which is implemented.

So I do know about 20% of the total number of aircraft in sufficient
detail to estimate a correlation.

I think a fair statement is 'A rating for the detail of instrumentation
and systems has a chance of 80% to be no more than 2 points different from
a rating of visual detail, i.e. there is an 80% chance that the visual
rating and the final rating (averaged over visuals and instrumentation
details) do not differ by more than 1 point.

Let's look at a few examples (not brought up by myself):

***

Stuart's rating of the c172p: 4/5
rescaled to a 10 point scale, that's an 8/10 where I have 7/10 - check.


***

The KC-135
I'm not sure what your quality rating from 0-10 would be - probably not
really zero, so I assume it's 1 or 2, so averaged with the visuals that's
about 2 or 2.5 where I have rated 3 - yes.

***

Sopwith Camel
> Does it win the ratings
> war?

Indeed it does - it received 10/10.

***

Lightning

Assuming you'd rate the FDM and systems 10, the average with beauty would
be something like 9.7 or 9.5, dependent if I take the FDM into
consideration.  My rating is 9.

***

p51d-jsbsim

Hal self-rated 7.5/10, I rated the p51d with 6 - that would pretty much
fit already, except that the p51d-jsbsim is a bit more detailed than the
p51d, so I would rate that 7 (well, sure I can say this after the
fact...). Yes, fits as well.


***

It seems you are bothered more by the fact that the F-14b is rated above
the Lightning, but here you are asking too much of the test. The test can
pick out both planes as 'has high probability of having very detailed
systems and above average FDM', but if you want to know which of them is
better in detail, the accuracy is not sufficient. The correlation between
'beauty' and 'quality' is there, but it is not that strong, correlation
isn't equality.

Which is why I am very much in favour of bringing in additional
information (like the developer self-rating Stuart suggested).

My point is not that rating based on visual detail is perfect and we
should leave it at that - my point is that in practice it works quite a
bit better than a mere beauty contest.

> If I recall my stats correctly, your assumption that there is a causal
> relationship between attractiveness of the cockpit and a high realism is
> unproven. In our statistically small sample, I think it will throw up as
> many wrong results as correct ones. Concorde is but one example.

I have not assumed a causal relationship (not do I need to). I observe in
practive a correlation, I utilize it, I don't need to understand it to do
so (I have given some speculation where it comes from though...).

Assuming that Stuart and Hal did the self-rating 'fair', and assuming you
did not know my numbers when you picked your examples, the system has
(within its accuracy) in fact not thrown as many wrong results as correct
ones. From the above, it has managed well with 5/5 examples, 5/6 is you
add the Concorde - but that was cherry-picked by myself as counterexample
(!) and therefore doesn't really count for a statistical test of a
hypothesis.

So, under the reasonable assumption that you didn't pick planes randomly
but that you picked planes assuming they would be likely to be
counterexamples to my rating, you have to grant me that the system has
dealt with them rather well and that your supposed counterexamples have in
fact not turned out to be as many wrong results as correct ones.

* Thorsten


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel

Reply via email to