On 12 Apr 2000 15:21:21 -0700, [EMAIL PROTECTED] (Paul Bernhardt)
wrote:
> I suspect in this forum, almost as bad as the F-word or N-word are the
> DM-words... Data Mining... I agree, but wonder about criteria.
- since IBM started touting a product by that name, it is hard to
ignore the new environment .... It is still possible that someone
will start will a small amount of information, and "torture the data
until it confesses." But online data collection produces databases
with millions of sales events, organized by date, store, etc. What
can be learned?
> Often in our various research domains we have no choice but to use
> retrospective data. A classic example might be validating an investment
> approach by examining historical data, which some call backtesting.
>
> What are the criteria, how can we know when we have chance findings?
>
Try to look for "independence" so that you have an N that gives you
increasing confidence; use something more extreme than 5% -- though
you may be fooling yourself if you think that your reported level
below the 0.1% level is really accurate.
> I've argued that if the model is based on an a priori hypothesis, or can
> be justfied by previously established theories, the possibility of data
> mining may be ignored. When the pre-existing theory is less substantial,
- How substantial is "less substantial" or how substantial was the
PRIOR? If you are sure something is there, maybe you don't need much
more evidence, okay. Right, more shoppers on a sunny day. On a
payday.
> one may ask if the discovered model fits data not included in the
> original model (data which occurs after the model was discovered, or data
> which precedes the data originally used to create the model).
>
> I'd like to hear the views of people on this forum.
>
> The specific situation I'm refering to is an investment model called the
> Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm)
> which was found to beat the S&P500 and Dow 30 over the period from 1973
> through 1993. Since that date, and further backtested to 1961, it has not
> similarly beat those traditional benchmark indexes, but also has not
> performed worse (both of which could be due to lack of power). The
> Foolish Four is based on a reasonable hypothesis that the worse
< snip >
One thing that remains true about stock investment schemes: There may
be some overall growth, somewhere, but in a specific, narrow
perspective, the whole market makes up a zero-sum game. If someone
wins, someone else has to lose.
IF there is an amount of regression-to-the-mean that you once were
able to count on, then AFTER it is publicized, it can't keep on
working for very long. If too many people try to cash in at once,
strict application of the formula can suddenly become a big loser.
Okay, you can work around the edges, and try to figure what stocks
really *ought* to have been the ones in that group, before eager
anticipation drove their prices up.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================