I was wondering if anyone could help me with an interesting problem. I am trying to forecast customer life span for a set of data.
Basically, we have 8 years data and thousands of rows regarding a subscription service. Three raw variables are as follows. a) Starting Date of subscription b) Cancellation Date of subscription c) Demograhpic Segments that a customer belongs to. We have 66 categorical values such as 01, 02..etc. These segments are given to us by an outside firm that basically appends a segment to a customer data based on variables such as what kind of car a customer drives, how much she is educated, or how much she earns etc. I am interested in predicting the number of months a customer would stay with the product. I was thinking I could use the following variables in my regression model. Dependent Variables: NumberOfMonths (derived from taking the difference between the starting and ending date of subscription for both cancelled customers and customer who are still with us) Independent Variables a) Status (whether a customer has cancelled (0) or still with us (1)) b) Demograhpic Segment Questions: Q1) Is it ok to calculate "NumberOfMonths" variable from starting and ending date of subscription? The reason I ask this is that for customers who have not cancelled subscription yet, it will only result in a number that will be the same whether they are still with us. Of course this information (cancellation of subscription) will simultaneously be captured in the "status" independent variable (0 or 1). Q2) I don't know how to use "Demograhpic Segment " independent variable since there are 66 different numeric codes for these segments. Should I use 65 (=66-1) dummy variables? Because if I do use 65 dummy variables my regression equation may not only be extremely long, but also potentially meaningless (dealing with so many variables). Q3) What extra information do you think I may need in order to create this model? Q4) Should I use the starting year as well in my model? Forecasting customer life span for a subscription service seems to be a common business problem and I was wondering if anyone had any canned solutions or provide me with pointers. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
