The EmpiricalDistribution class in the random package is designed to
support large samples. It does not store all of data points in
memory, but instead bins the data and uses smoothing kernels within
the bins. I have recently had the need for a discrete empirical
distribution - i.e., an
This will be very useful.
Sampling from discrete ECDF's is also closely related to generating samples
from a multinomial distribution. I did a bit of work on the latter
problem. The result of that work is in
org.apache.mahout.math.random.Multinomial
The major difference that you will have is