Making parameter files for Multimix using R or Splus Murray Jorgensen The purpose of this message is to introduce a short S program to aid in the creation of parameter files for the Multimix program. Multimix is a Fortran 77 program for mixture model based cluster analysis written by Lyn Hunt. It is described in the paper 'Mixture model clustering using the Multimix program' by L. Hunt and M. Jorgensen (Australian & New Zealand Journal of Statistics, 41, 1999, pp 153-171) and also in some papers at ftp://ftp.math.waikato.ac.nz/pub/maj/ , from where the program itself and related files may be downloaded, including the program pfile.rs that I am describing here. The distributions used by Multimix are built up from building blocks of discrete distributions, (possibly multivariate) normal distributions, and location models (also called 'conditional Gaussian' distributions) which are p-variate normal distributions, except that the means may depend on a (p+1)st discrete variable. The model to be fitted is described to Multimix by way of a fully numeric 'parameter file'. An interactive Fortran program 'read3' is available to create parameter files. These can also be created in a text editor either from scratch, or by editing older parameter files. An explanation of the parameter file format may be found in the file notes.ps at the above URL. When the number of variables (attributes) is large, read3 can be tedious to use, so I have written an S program to do the same job which may prove more convenient to use for those familiar with R or Splus. The following two examples should demonstrate its use. Example 1. Suppose we wish to make the following (within cluster) distri-butional assumptions for a data set with nine attributes: Attribute: 1 2 3 4 5 6 7 8 9 Var type: C D D C C C D D C + + +Bivariate Normal * * *Location [3-category] + +Discrete [binary] * * *Location [5-category] + +Discrete [4-category] * *Univariate Normal 1 2 3 4 5 6 7 8 9 C D D C C C D D C here C stands for continuous, D for discrete. Then in the R or Splus command window make the following assignments: dvars <- c(3,8) dlevs <- c(2,4) nvars <- list(list(1,4),list(9)) lvars <- list(list(2,5),list(7,6)) llevs <- c(3,5) file <- "d:/writing/multimix/examp.par" # or whatever and paste in the S program. The file 'examp.par' that is created follows: ngroups nobs 9 6 2 3 6 1 4 7 9 8 2 5 1 1 2 1 2 2 0 0 2 1 1 1 1 2 3 5 6 8 1 2 4 5 7 9 1 1 2 2 3 3 1 1 2 2 2 3 4 3 4 2 4 0 0 0 3 0 5 0 Replace this text by initial classification (class assignment for each observation) To fit a model with 2 groups to a data set with 100 observations replace 'ngroups' by 2, 'nobs' by 100 and the last two lines of the file by some possibly random initial assignment like 1 2 1 1 2 2 2 2 1 2 2 1 2 2 1 1 2 2 1 1 1 2 1 2 2 1 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 1 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 2 2 2 1 1 2 2 1 1 1 2 1 2 1 2 2 1 2 2 1 2 1 2 1 2 2 2 2 1 2 2 1 1 1 2 1 1 2 2 2 2 Example 2. This is based on data that I am working on now. There are 33 binary variables in the first 33 input columns, followed by 21 continuous variables. I am in the early stages of exploration with this data set, so I am using a model with full local independence. This implies 33 discrete variables, 21 univariate normals, and no location models. To set this up I make the initial assignments: dvars <- 1:33 dlevs <- rep(2,33) nvars <- lapply(34:54,as.list) lvars <- NULL llevs <- numeric(0) The output file will need editing to supply the number of groups, the number of observations, and the initial assignment as in the first example. The EM algorithm used by Multimix may also be initialized by specifying starting parameter values, but it is usually easiest to do this from parameter output files created by Multimix itself. 2000-11-09 Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand *Applications Editor, Australian and New Zealand Journal of Statistics* [EMAIL PROTECTED] Phone +64-7 838 4773 home phone 856 6705 Fax 838 4155
