: Dave Nulton wrote:
> I'm writing a simulator in C++. So far I have written a program to
collect
> data from a database and hope to be able to generate an algorithm to
return
> a random value with a distribution that matches my real world data. What
> I'm finding is that the data is UGLY. In order to generate a reasonable
> representation of the data, I'd need almost 3 million bins, and then most
of
> the information would be crammed into the first 1000 or so bins. I've
drawn
> an ASCII art representation below.
>
> I don't want to give up those flyers, because they sum up to a
considerable
> amount. I'm modeling man loading in a manufacturing facility, so throwing
> out the flyers will really skew my simulator.
>
> Has anyone ever encountered such a problem? Better yet, can someone
> recommend a C++ algorithm to model my data? I'm thinking I may have to go
> to some sort of a logarithmic distribution, but it is important to base my
> simulator on real world data and not generic algorithms. I would be
willing
> to fit a model if I knew of a good model and how to utilize it in C++.
>
> -dnult
> / \
> / \
> / \ /\ ! /\ ^ . . .
> *' `' `*********** ******* ********
>
Two points:
(1) I and others have said on various occasions: please, if you're
asking for advice about a data set, tell people what it is. I'm not
entirely sure that I understand the psychology of this practice, but the
result is akin to going to the dentist and refusing to open your mouth.
(2) Try transforming. I don't know if this is good advice or not - see
(1).
-Robert Dawson