Hi all, I have a question for the list regarding grouping (binning) of the independent variable in a linear regression. This is routinely done (at least in limnology) in studies involving so-called biomass size-spectra. I'm aware of other (better) methods to fit non-linear models. However, I need to compare my results with older literature where this method is used widely, and I'd like to know first if the method has a problem or if it is outright wrong.
My independent variable is mean body size of the individuals of a species (M) and the dependent is either biomass (B, g/m2) or population density (D, indiv/m2) of the species. Body size is lognormally distributed, and the number of species in the sample is ~100. The model to fit is: D= aM^b. First, data are log-transformed in order to apply linear least-squares regression. So the model becomes log(D)= log(a)+ b log(M). The appropriateness of this transformation and possible bias in the estimation of parameters have been discussed before (Zar, Smith, others) so my question in not about that. After log-transforming, sizes are grouped into even-spaced categories, and the densities/biomasses for all sizes within a size group are summed up. So, the independent variable becomes the center of each log-size-bin, and the dependent becomes the sum of all log-densities for each size-bin. Obviously, the number of data gets reduced from the original N to the number of size groups/bins used. After grouping, the log-log model is fitted by least-squares regression. So my questions are: Is this binning of a log-transformed variable statistically appropriate for this problem? Shouldn't be better to use directly the size and density for each species without any grouping? Thanks in advance for any suggestion or literature. Cheers Francisco de Castro Potsdam University
