You report an empirical distribution like this:
| / \
| / \
| / \ /\ ! /\ ^ . . .
| *' `' `*********** ******* ********
@ @ @ @ @
Have you considered modelling it as a collection (sometimes called a
mixture) of separate distributions with means at the locations marked
(approximately) with "@"? It's not clear whether the proposed separate
distributions would best be modelled as normal (Gaussian) or Poisson or
lognormal -- depends partly on how you think the data might have been
constructed "in the raw". Also not clear, if one chooses "normal",
whether it be reasonable to assume the same variance for all; your "!"
rather looks as though it was meant to represent a quite narrow peak,
which would argue for heteroscedastic distributions.
Representation as a mixture would be rather more strongly
supported (more strongly, that is, than as pure empiricism) if the
several values "@" were themselves interestingly distributed, and in a
way that invited some theoretical thought. (For a simple-minded example,
at roughly equal intervals with relative frequencies that diminished
exponentially to the right. Or if the data in those peaks turned out to
be associated with useful categories.)
On Sun, 9 Jan 2000, Dave and Kim Nulton wrote:
> I'm writing a simulator in C++. So far I have written a program to collect
> data from a database and hope to be able to generate an algorithm to return
> a random value with a distribution that matches my real world data. What
> I'm finding is that the data is UGLY. In order to generate a reasonable
> representation of the data, I'd need almost 3 million bins, and then most of
> the information would be crammed into the first 1000 or so bins. I've drawn
> an ASCII art representation below.
>
> I don't want to give up those flyers, because they sum up to a considerable
> amount. I'm modeling man loading in a manufacturing facility, so throwing
> out the flyers will really skew my simulator.
>
> Has anyone ever encountered such a problem? Better yet, can someone
> recommend a C++ algorithm to model my data? I'm thinking I may have to go
> to some sort of a logarithmic distribution, but it is important to base my
> simulator on real world data and not generic algorithms. I would be willing
> to fit a model if I knew of a good model and how to utilize it in C++.
>
> -dnult
> / \
> / \
> / \ /\ ! /\ ^ . . .
> *' `' `*********** ******* ********
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128