I'm writing a simulator in C++. So far I have written a program to collect
data from a database and hope to be able to generate an algorithm to return
a random value with a distribution that matches my real world data. What
I'm finding is that the data is UGLY. In order to generate a reasonable
representation of the data, I'd need almost 3 million bins, and then most of
the information would be crammed into the first 1000 or so bins. I've drawn
an ASCII art representation below.
I don't want to give up those flyers, because they sum up to a considerable
amount. I'm modeling man loading in a manufacturing facility, so throwing
out the flyers will really skew my simulator.
Has anyone ever encountered such a problem? Better yet, can someone
recommend a C++ algorithm to model my data? I'm thinking I may have to go
to some sort of a logarithmic distribution, but it is important to base my
simulator on real world data and not generic algorithms. I would be willing
to fit a model if I knew of a good model and how to utilize it in C++.
-dnult
/ \
/ \
/ \ /\ ! /\ ^ . . .
*' `' `*********** ******* ********