Can anyone get me started in the literature on the 'small n, large p' problem?
A couple of references would be helpful. The problem I am faced with are consultee's who do not understand why a simple logistic model is wrong with n = 50 (say 25 cases 25 controls) and the number of variables (p) equal to tens of thousnads (say gene expressions or whole genome genotypes). (I try to explain about 50 data points embedded in a very high dimensional space but they generally start getting that glaxzed look in their eyes.) Bill --- William D. Shannon, Ph.D. Assistant Professor of Biostatistics in Medicine Division of General Medical Sciences and Biostatistics Washington University School of Medicine Campus Box 8005, 660 S. Euclid St. Louis, MO 63110 Phone: 314-454-8356 Fax: 314-454-5113 e-mail: [EMAIL PROTECTED] web page: http://ilya.wustl.edu/~shannon
