Problems like this are extensively studied by the data mining community.

One highly cited paper in this domain is in

http://citeseer.nj.nec.com/agrawal98automatic.html

Strictly speaking, that paper is about finding simple clusters similar
to what you suggested below for *high* dimensional data, which is not
exactly what you want....

You may also want to check out
http://www.autonlab.org/autonweb/showPaper.jsp?ID=pelleg-mixtures

If you happen to know any data mining expert, maybe you can consult
with them, too.


Hope that helps.


In sci.stat.math saisat <[EMAIL PROTECTED]> wrote:
> All:

> I have a customer database closed to 5 million customers
> Each customer has different category variables (like Customer Type,
> Country of Origin etc) and different range variables (like Daily
> Transaction amount, Daily Transaction Count etc). I need to segement
> these customers into different groups or clusters where in the group
> members in a group share common characteristics

> For example if i have the Data set

> Id    Ctry    CustomerType    DailyTransactionAmt     
> 1     IQ      CType 1         2000
> 2     IQ      CType 1         3000
> 3     IQ      CType 1         4000
> 4     IQ      CType 1         3000
> 5     IQ      CType 1         10000
> 6     IQ      CType 1         11000
> 7     IQ      CType 1         12000
> 8     IQ      CType 1         11000
> 9     IN      CType 1         10000
> 10    IN      CType 1         15000
> 11    IN      CType 1         55000
> 12    IN      CType 1         60000
> 13    IN      CType 1         70000
> 14    IQ      CType 2         85000
> 15    IQ      CType 2         75000
> 16    IQ      CType 2         90000
> 17    IQ      CType 2         10000
> 18    IQ      CType 2         3500
> 19    IQ      CType 2         3000
> 20    IQ      CType 2         4000
> 21    IQ      CType 2         4000
> 22    IN      CType 2         1100
> 23    IN      CType 2         1000            


> I need an output like


> CType1 --- IQ -- (2000 <= amt<= 4000)  [Members: 1,2,3,4]
> CType1 ---- IQ -- (10000 <= amt <=12000)   [Members: 5,6,7,8]
> CType1 ---- IN -- (10000 <= amt <=15000) [Members: 9,10]
> CType1 ---- IN -- (55000 <= amt <=70000) [Members: 11,12,13]
> CType2 ---- IQ -- (75000 <= amt <=100000) [Members: 14,15,16,17]
> CType2 ---- IQ -- (3000 <= amt <=40000) [Members: 18,19,20,21]
> CType2 ---- IN -- (1000 <= amt <=1100) [Members: 22,23]


> Please note that I dont know the number of clusters before hand. 
> I am new to this area and am reading up on different material and I
> would appreciate any suggestions you can provide

> Thanks
> Satish
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to