Thanks for the various comments I've gotten (most sent directly to me) on
my problem with random sampling from correlation matrices.  

For those who've requested, here's little bit of background information.
I'm interested in a biological phenomenon known as morphological
integration, and I work on skeletal development in vertebrates (mostly
fishes).  Animals becomes regionally compartmentalized during development,
such that some suites of bones (those of the jaw, for example, or of the
forelimb) become more tightly integrated with one another (ie, more highly
correlated in their sizes and shapes) than they are with bones in other
suites, although all are correlated at some level.  This can be modeled as
a time-dependent correlation matrix, in which the correlations change with
age or size, with increasing within-suite correlations and decreasing
among-suite correlations.  (Actually, we use covariances rather than
correlations because the scaling is important, but the principles are the
same.)

I'm interested in modeling this for several reasons.  First, several
different quantitative measures of morphological integration (indices) are
in use in the literature, and I'm interested in their (largely unknown)
distributional properties.  Second, morphological integration relates to
several other biological aspects of development, such as fluctuating
bilateral asymmetry, allometric gradients, and metamorphosis, all of which
can also be modeled with time-dependent covariances.

So, what I want to be able to do is to postulate a set of "target"
correlation matrices, varying such things as the numbers of character
(=variable) suites, the numbers of characters per suite, and the strengths
of the within-suite and among-suite correlations, and for each of these to
generate samples of potential "morphologies".  Although most such matrices
will be similar to those observed for real organisms and thus very well
behaved, I occasionally want to gradually push the envelop to extreme
conditions, and that's when I bump into statistically incompatible or
ill-conditioned sets of correlations.  It seemed reasonable to me in such
cases to step back to the "closest" correlation matrix that is internally
consistent, which is where my problem arose.

Several people have suggested to me the following numerical solution: get
the eigenvectors and eigenvalues, set the negative eigenvalues to zero
(there's generally only one that's negative) and proportionately adjust the
others to maintain the same sum (total variance), and reconstruct the
correlation matrix.  I've tried it, and so far it seems to work very well
in practice.  However, Rich Ulrich has raised the spectre of "nearly
invalid" results, and so what I plan to do is to begin with a
well-conditioned correlation matrix and gradually change it until it
becomes positive indefinite (is that the correct term?), and check whether
the adjustment is consistent with the changes I made in the matrix leading
up to the ill-conditioning.

So if anyone has any further thoughts on this, or if you're interested in
the results, please let me know.  And thanks again for the help I've gotten
so far.

Rich Strauss

At 12:00 PM 11/17/99 -0500, you wrote:
>On 16 Nov 1999 13:29:31 -0800, [EMAIL PROTECTED] (Rich Strauss)
>([EMAIL PROTECTED]) wrote:
>
>> I have a problem that I had initially thought would be straightforward (but
>> then, what is?).  For a Monte Carlo-type simulation study, I want to be
>> able to to generate sets of pseudorandom numbers having correlations equal
>> to (or differing only randomly from) a target correlation matrix that I
>> specify up front, based on postulated relationships among variables.  This
>> is very easy to do using the classic method of Kaiser & Dickman (1962), as
>> long as the target correlation matrix is positive definite (PD) (ie, has
>> all positive eigenvalues).  If not, the algorithm (programmed in Matlab)
>> returns complex numbers, which are not satisfactory for my purposes.
>> 
>> So, for a non-PD target correlation matrix, I decided to find the PD matrix
>> that is "closest" to the target matrix in some sense.    ...
>
>Slow down;  stop;  back up.
>
>You don't say what your Monte Carlo is for, and why you are putting in
>a variety of correlations, but you don't seem to be taking this "bad
>conditioning"   seriously enough.  -- Look at it this way:  If you set
>yourself up with a matrix that is the next-closest thing to an invalid
>correlation matrix, you are going to get the next-closest thing to
>invalid results -- In this case, it seems that you are planning to do
>it  without ever measuring or recording just how close you are to the
>limit, because you are just (blindly) approximating some target.
>
>It seems to me a  Monte Carlo study across various correlations can do
>one of two things:  It can concentrate on small correlations, where
>there is not a problem; or it can describe the whole problem, and
>arrange its correlations, within the limits of the particular multiple
>R-squared, or conditioning, or some other suitable index.  I know that
>I read a study that did this, a couple of years ago (maybe, JAMA?),
>and I know that 10 or so years ago, I read a couple of (wretched)
>studies that failed to take this into account.
>
>Sorry, I don't have more specific reference.
>
>>                                                   Somewhere in the
>> past I had gotten the idea that, for a correlation matrix to be PD, all of
>> the pairwise correlations must be internally consistent with respect to all
>> of their partial correlations.  
>
> - I don't know, maybe you get enough information if you insist that
>multiple Rs be legal?
>
>-- 
>Rich Ulrich, [EMAIL PROTECTED]
>http://www.pitt.edu/~wpilib/index.html
> 

========================
Dr Richard E Strauss            
Biological Sciences              
Texas Tech University           
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963                             
========================

Reply via email to