2010/8/12 Renato Fabbri <[email protected]>: > Dear All, > > help appreciated, thanks in advance. > > how do you fit a pdf you have with a given pdf (say gamma). > > with the file attached, you can go like: > > a=open("AC-010_ED-1m37F100P0.txt","rb") > aa=a.read() > aaa=aa[1:-1].split(",") > data=[int(i) for i in aaa] > > if you do pylab.plot(data); pylab.show() > > The data is something like: > ___|\___ > > It is my pdf (probability density function). > > how can i find the right parameters to make that fit with a gamma? > > if i was looking for a normal pdf, for example, i would just find mean > and std and ask for the pdf. > > i've been playing with scipy.stats.distributions.gamma but i have not > reached anything. > > we can extend the discussion further, but this is a good starting point. > > any idea?
A general point on fitting empirical probability density functions is that it is often much easier to fit the cumulative distribution function instead. For one thing, this means you don't have to decide on the intervals of the bins in the histogram. For another, it's actually often the cdf that is more related to the final answer (though I don't know your application, of course). Here's a quote. `So far the discussion of plots of distributions has emphasized frequency (or probability) vs. size plots, whereas for many applications cumulative plots are more important. Cumulative curves are produced by plotting the percentage of particles (or weight, volume, or surface) having particle diameters greater than (or less than) a given particle size against the particle size. … Such curves have the advantage over histograms for plotting data that the class interval is eliminated, and they can be used to represent data which are obtained in classified form having unequal class intervals' (Cadle, R. D. 1965. Particle Size. New York: Reinhold Publishing Corporation, pp. 38-39) Once you've got your empirical cdf, the problem reduces to one of nonlinear curve fitting, for whichever theoretical distribution you like. For a tutorial on nonlinear curve fitting, see scipy.optimize.leastsq at http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html. You could of course use this approach for the pdf too, but I fancy the cdf result will be more robust. On the other hand, if you want something like your `mean and variance' approach to fitting normal distributions, you could still compare your mean and variance with the known values for the Gamma distribution (available e.g. on its Wikipedia page) and back-out the two parameters of the distribution from them. I'm not too sure how well this will work, but it's pretty easy. Another idea occurs to me and is about as easy as this is to compute the two parameters of the Gamma distribution by collocation with the empirical cdf; i.e. pick two quantiles, e.g. 0.25 and 0.75, or whatever, and get two equations for the two unknown parameters by insisting on the Gamma cdf agreeing with the empirical for these quantiles. This might be more robust than the mean & variance approach, but I haven't tried either. Good luck! _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
