Be aware that the accident reporting system some years ago going into the GFA 
system was significantly deficient.  In one state reporting was running at 
about  50 -70% of the claims rate.  

If we had not seen a change in the culture,  discussions were going to be taken 
with the insurance industry to obtain actual claims data.   


 Peter Heath 





---- Teal <[email protected]> wrote: 

=============


On 10/03/2016 6:50 PM, Texler, Michael wrote:
>>   I've not seen them described that way in the road safety literature that 
>> I'm familiar with. How would that work? If the number of accidents is on the 
>> Y axis, what variable would the X axis have? If we go with road accidents 
>> (my field of expertise) it can't be age/driving experience, because the 
>> accident stats in NO way form a poisson distribution  when age/experience is 
>> your X-axis variable. (Actually, road prangs by age/experience gives you 
>> more of a U-shaped curve.) Also, rate of accidents (be they road prangs or 
>> glider prangs) aren't constant over time (as required for a poisson 
>> distribution to be your distribution of
> choice) - they vary by time of day, for fairly obvious reasons, as well as 
> other things (day of the week, long weekends, etc etc).
>
>> You appear to be approaching the issue from a rather different statistical 
>> approach to the ones I'm familiar with. Could you spell out your 
>> approach/methods in more detail? It's always interesting to hear how folk in 
>> other fields approach problems I'm familiar with. :-)
> I am approaching it as counting events occurring over a duration of time 
> (analogous to say counting disintegrations per second for radioactive decay).
>
> Y axis would be the accident rate with any metric that you care to choose 
> (i.e. accidents per 1,000 hours flown, accidents per 100km travelled, 
> accidents per 1,000 flights etc.).
> Y axis would be a duration of time, i.e over one year, over 10 years, over 
> 100 years.
>
> Then it is a case of using the appropriate test to compare the two groups 
> (null hypothesis being that the accident rate between two groups is the same).

I'm afraid I'm still not with you. *Which* two groups, exactly? 
Displaying all recorded traffic accidents over time in that way will (if 
you use Australian data) give you a single line that (depending on the 
period covered, but lets go with "the last 20 years") trends downward 
over time. Who are you comparing again whom, in your example?

> A fairly blunt measure granted.
>
> Given your experience with road accidents analysis, how would you approach it?

Well, it would depend on exactly which question was being asked.  If we 
were interested in the numbers of accidents had by drivers of different 
ages, my previous example (up in the first para quoted above) was a 
simple descriptive graph showing difference in number of accidents by 
age, for a set amount of time (a year, say). Or we could do it another 
way, and have a graph with dates along the X axis, and separate lines 
(one for each age group, maybe 16-25, 26-35 and so on) showing how 
accident numbers have changed over time for each age group, if we were 
interested in seeing if there were any obvious differences in crash 
rates over time by age group.

Or, if the question whether a particular time of day is more crash-prone 
than other times, we could graph all the accidents occurring in the last 
year with the X axis showing hours of the day (midnight-0200, 0201-0400, 
etc). Or whatever.  All this is pretty basic stuff. We could go on from 
there, and report means and standard deviations for age groups/time 
periods/whatever of interest, and see if anything leaps out in terms of 
obvious differences or trends. But that still isn't going to get you 
anything you might want to discuss using null hypotheses or p values ... 
for that you really do need actual *inferential* statistical tests, with 
specific groups that you are comparing. And this broad-brush descriptive 
approach isn't going to give you that. You need to narrow it down a bit.

So: lets come back to the original topic that started all this - glider 
accidents. How would I approach that?

Well, first would be deciding exactly what question I want an answer to. 
Do I want to know if the glider prang rate is increasing or decreasing 
over time? Or do I want to know whether more crashes are happening in 
comps than in cross-country gliding?  Or how the glider crash rate as a 
whole compares with the number of motorcycle crashes for a given period?

Lets go with the last one, since we were also discussing that earlier. 
Firstly, getting a good source of data for *both* of those elements in 
the comparison is tricky. So I'm gonna handwave past that and assume 
that we have good quality data on both of these, including exposure data 
(i.e. how much time was spent per pilot/cyclist actually flying/cycling 
during that time period), because exposure is critical for topics like 
this: it means absolutely nothing to say that there were 12 glider 
prangs and 355 bike prangs in a given period, if we don't *also* know 
that there were a lot more cyclists on the road, driving for a lot more 
overall hours, than there were glider pilots in the air during the same 
period.

OK. So now I hypothetically have ten years' worth of crash rates per 
hour of flying or riding for the respective groups, and I want to 
compare them. This is where the inferential statistics come in. There 
will be differences between any two groups that are simply random 
chance, but the real trick is identifying *actual* differences through 
the "noise" of random variation. We want to perform a simple comparison 
of the two groups, to see if they basically have the same means and 
variances - i.e. is it reasonable to assume they're both samples from 
one overall population? (Yes, I know they're probably not in real life, 
but that's how the statistical tests work.) In this example I'd probably 
go for a t-test for independent samples (since we're assuming that the 
bikers and the pilots are, by and large, different people). And what 
that would give me would be a probability value which, as you pointed 
out earlier, is basically the probability that the difference between 
the groups is due to random chance, as opposed to being a real 
difference. So if we get a p value of .05 from my t-test, that tells us 
that there is a 5% chance that this result is a random fluke, and a 95% 
chance that it's a real difference between our bikers and glider pilots.

Lets mix it up a bit. What if we want to add other factors into the 
model to see if that makes any difference... age, say. Are the patterns 
of accidents for pilots and bikers of different ages similar? Does it 
matter what the age of the vehicle they're flying/riding is? For those 
I'd probably run a regression or analysis of variance of some kind on 
the data, with the exact type dependent on the exact nature of the 
additional factor(s) I'm plugging into the model. Or lets say I come 
across a group of bikers who also fly gliders. That's extra-useful, 
because, as the *same* individuals doing both activities, we can get a 
*lot* more statistical power out of whatever model we choose. 
Repeated-measures analysis of variance may well be my tool of choice for 
that sort of analysis. Or maybe even a mixed-methods general linear 
model (now, *those* can get complex enough to lead to tears and tearing 
of hair...)

And so on it goes.

Does that help clarify things?


Teal



_______________________________________________
Aus-soaring mailing list
[email protected]
http://lists.base64.com.au/listinfo/aus-soaring

_______________________________________________
Aus-soaring mailing list
[email protected]
http://lists.base64.com.au/listinfo/aus-soaring

Reply via email to