I think this is formally equivalent to the problem of estimating the
lifetime of a manufactured product such as, say, an incandescent
electric light bulb. Data are commonly censored to some degree, as
yours are, because it is uneconomic to keep on observing the last 1% or
so of one's sample until they finally give up the ghost.
("Censored" means that there are values in your data that are unknown,
being either less than the smallest actual value observed ("censored on
the left"), or more than the largest value observed ("censored on the
right", as your data are). It is possible for data to be censored at
both ends, but you don't have to worry about that... I think.)
Years ago, when I was looking up references on censored distributions
and the estimation of parameters (mean, e.g.) of the underlying (and
therefore unobserved) UNcensored distributions, the most helpful papers
I found were written by engineers at General Electric. Sorry, that was
very long ago and I don't remember references or names of investigators.
But a search on "censored" should yield lots of hits.
RE: your "Demographic" variables with 66 categories:
You seem to have plenty of cases. When you've arrived at a useful form
of analysis, apply it separately to each category. You then may wish to
compare parameters across categories, to see whether the categories tend
to cluster in clumps, or to vary gradually, or whatever; or to see
which categories appear indistinguishable from which others; etc.
On Wed, 5 May 2004, AJ wrote:
> I was wondering if anyone could help me with an interesting problem.
> I am trying to forecast customer life span for a set of data.
>
> Basically, we have 8 years data and thousands of rows regarding a
> subscription service. Three raw variables are as follows.
>
> a) Starting Date of subscription
> b) Cancellation Date of subscription
> c) Demograhpic Segments that a customer belongs to. We have 66
< snip >
>
> I am interested in predicting the number of months a customer would
> stay with the product.
<snip, the rest>
------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================