: Dave Nulton wrote:


> I'm writing a simulator in C++.  So far I have written a program to
collect
> data from a database and hope to be able to generate an algorithm to
return
> a random value with a distribution that matches my real world data.  What
> I'm finding is that the data is UGLY.  In order to generate a reasonable
> representation of the data, I'd need almost 3 million bins, and then most
of
> the information would be crammed into the first 1000 or so bins.  I've
drawn
> an ASCII art representation below.
>
> I don't want to give up those flyers, because they sum up to a
considerable
> amount.  I'm modeling man loading in a manufacturing facility, so throwing
> out the flyers will really skew my simulator.
>
> Has anyone ever encountered such a problem?  Better yet, can someone
> recommend a C++ algorithm to model my data?  I'm thinking I may have to go
> to some sort of a logarithmic distribution, but it is important to base my
> simulator on real world data and not generic algorithms.  I would be
willing
> to fit a model if I knew of a good model and how to utilize it in C++.
>
> -dnult
>     / \
>    /   \
>   /     \   /\      !           /\         ^   .  . .
> *'        `'  `*********** ******* ********
>

    Two points:

        (1) I and others have said on various occasions:  please, if you're
asking for advice about a data set, tell people what it is.  I'm not
entirely sure that I understand the psychology of this practice, but the
result is akin to going to the dentist and refusing to open your mouth.

    (2) Try transforming.  I don't know if this is good advice or not - see
(1).

    -Robert Dawson




Reply via email to