Problems like this are extensively studied by the data mining community. One highly cited paper in this domain is in
http://citeseer.nj.nec.com/agrawal98automatic.html Strictly speaking, that paper is about finding simple clusters similar to what you suggested below for *high* dimensional data, which is not exactly what you want.... You may also want to check out http://www.autonlab.org/autonweb/showPaper.jsp?ID=pelleg-mixtures If you happen to know any data mining expert, maybe you can consult with them, too. Hope that helps. In sci.stat.math saisat <[EMAIL PROTECTED]> wrote: > All: > I have a customer database closed to 5 million customers > Each customer has different category variables (like Customer Type, > Country of Origin etc) and different range variables (like Daily > Transaction amount, Daily Transaction Count etc). I need to segement > these customers into different groups or clusters where in the group > members in a group share common characteristics > For example if i have the Data set > Id Ctry CustomerType DailyTransactionAmt > 1 IQ CType 1 2000 > 2 IQ CType 1 3000 > 3 IQ CType 1 4000 > 4 IQ CType 1 3000 > 5 IQ CType 1 10000 > 6 IQ CType 1 11000 > 7 IQ CType 1 12000 > 8 IQ CType 1 11000 > 9 IN CType 1 10000 > 10 IN CType 1 15000 > 11 IN CType 1 55000 > 12 IN CType 1 60000 > 13 IN CType 1 70000 > 14 IQ CType 2 85000 > 15 IQ CType 2 75000 > 16 IQ CType 2 90000 > 17 IQ CType 2 10000 > 18 IQ CType 2 3500 > 19 IQ CType 2 3000 > 20 IQ CType 2 4000 > 21 IQ CType 2 4000 > 22 IN CType 2 1100 > 23 IN CType 2 1000 > I need an output like > CType1 --- IQ -- (2000 <= amt<= 4000) [Members: 1,2,3,4] > CType1 ---- IQ -- (10000 <= amt <=12000) [Members: 5,6,7,8] > CType1 ---- IN -- (10000 <= amt <=15000) [Members: 9,10] > CType1 ---- IN -- (55000 <= amt <=70000) [Members: 11,12,13] > CType2 ---- IQ -- (75000 <= amt <=100000) [Members: 14,15,16,17] > CType2 ---- IQ -- (3000 <= amt <=40000) [Members: 18,19,20,21] > CType2 ---- IN -- (1000 <= amt <=1100) [Members: 22,23] > Please note that I dont know the number of clusters before hand. > I am new to this area and am reading up on different material and I > would appreciate any suggestions you can provide > Thanks > Satish . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
