[amibroker] Re: Data mining bias vs number of observations

avlovestrading Mon, 21 Apr 2008 21:37:52 -0700

Thank you Brian for your excellent post. If you don't mind sometime do
post your list of other rocks that traders dash their ships in.


Cheers,
AV

--- In [email protected], "brian_z111" <[EMAIL PROTECTED]> wrote:
>
> Now that you have got me thinking about the subject I have decided to 
> pencil in some new rules to my 'little green book':
> 
> - datamining is a fancy name for 'tuning our system to a dataset'
> - anytime we change one system rule, by any amount, based on data 
> feedback, we are tuning, even if that dataset is produced in live 
> trading
> - the best test, of a robust system, is when we submit it to a 
> dataset, that is unknown to the system, without any changes to the 
> rules, and the variance in the outcomes is low, when compared to the 
> previous test (providing the test samples are > 3-400 trades at the 
> least).
> 
> Hope that clarifies it for you.
> 
> brian_z
> 
> 
> --- In [email protected], "brian_z111" <brian_z111@> wrote:
> >
> > Hello Simon,
> > 
> > Great question.
> > 
> > I have an interest in Single Sample Testing (SST) and pushing the 
> > boundaries there. It is a big NO, NO to the 'defenders of the 
> faith'.
> > 
> > I also have a strong bias to simple systems. No, or few, indicators 
> > with lookback periods etc (I don't use many rules/lose degrees of 
> > freedom) hence my interest in the subject.
> > 
> > My gut feeling tells me I can do it but I haven't got far with the 
> > proof (however that doesn't mean much since there are terabytes of 
> > books and academic research, out there, that I am totally unaware 
> of).
> > 
> > Personally, I think SST only has academic interest.
> > I am following it because I am curious, I learn from the enquiry 
> and 
> > I love to confound my critics.
> > 
> > So, possibly your friend is correct but if s/he is absolutely 
> certain 
> > about it s/he would be capable of writing a book on evaluation - in 
> > fact if that is the case, I wish s/he would, thereby saving me a 
> lot 
> > of time and trouble.
> > 
> > Anyway, over to the here and now.
> > 
> > > My question is, does anyone know if the data-mining bias can be
> > > considered irrelvant when the sample size is so large? (in this 
> > >case,
> > > the sample size is roughly 8400 trades). 
> > 
> > Possibly I can ride my motorbike, at 200mph, going the wrong way up 
> a 
> > 6 lane highway but what is the point if I just want to get from A 
> to 
> > B - am I going somewhere or thrill seeking?
> > 
> > Here are some rules from my notebook:
> > 
> > - good data, relevant to current conditions, is scarce. Why waste 
> it?
> > - sample error is real
> > - around 300 to 400 trades is the minimum, with no further 
> > substantial minimization of sample error beyond, around 10,000
> > - there is a sweet spot around 1,000 - 5,000 trades
> > - if data is short then work with no less than 3-400
> > - if data is in plentiful supply (intraday?) then use more
> > - one sample might be good enough (in exceptional circumstances/for 
> > exceptional traders) but why not reduce risk and use more (if you 
> > have the data)
> > - 1 IS and 1 OOS is better than 1 IS alone
> > - even though I am interested in SST, and more likely than most to 
> > succeed with it, I am actually using several OOS, of optimum 
> length, 
> > whenever I can.
> > 
> > No, 8400 trades, in a single IS test, does not guarantee success 
> (it 
> > is very easy to find rare cases, on a computer, because we can work 
> > our way through such large datasets in a relatively short space of 
> > time - 1 in a million chance in real life === 1 in a backtest 
> chance 
> > on a computer).
> > 
> > We can't rely on stats alone - they never give a definitive answer.
> > 
> > Different story if your friend has observed a persistent, and 
> > predictable, market inefficiency and the stats are just confirming 
> > and quantifying that.
> > 
> > >Put another way, with so many
> > > observations, how many different rules would have to be back 
> tested 
> > >in
> > > order for data-mining bias to creep in?
> > 
> > I am still mulling over this point.
> > 
> > What is the least number of rules that a useful system could be 
> > described in? Perhaps three rules would be the least that anyone is 
> > successfully using (I don't know - I am wondering how many is the 
> > least possible).
> > 
> > Say I have a system with only three rules - if I test it IS and 
> > change 1 rule a little bit I am still tuning the system to that 
> data, 
> > aren't I?
> > 
> > If I have a system with only three rules, test in IS, and it is 
> > successful, then test it OS and it is successful, all I am doing is 
> > confirming that the system is tuned to those two particular 
> datasets, 
> > aren't I .
> > 
> > Based on those observations I would say that, since we can't avoid 
> > data mining, even with simplistic methods, then we are always 'data 
> > mining' when we use historical data.
> > 
> > The only time we are not datamining is when we are live trading.
> > 
> > OOS testing is the historical surrogate for live trading, in that 
> at 
> > least the data is unknown, to the system, prior to walkforward or 
> OOS.
> > 
> > The only thing about datamining that varies, when we are using 
> > historical data, is the degree.
> > 
> > The more rules + the greater the range of adjustble parameters 
> within 
> > the rules == the more likely we are to be 'fooled by randomness'.
> > 
> > In short - no matter what we do we can never achieve 100% certainty 
> > but OOS and live paper trading will minimize the risk compared to 
> SST 
> > alone.
> > 
> > Some food for thought:
> > 
> > Data mining, per se, is not the only thing on the list of 'rocks 
> that 
> > traders dash their ships on' - there's more on the same list (most 
> of 
> > them receive a lot less publicity).
> > 
> > brian_z
> > 
> > brian_z
> > 
> > 
> > --- In [email protected], "si00si00" <si00si00@> wrote:
> > >
> > > Hi all,
> > > 
> > > I have a friend who has developed a trading system. It is an 
> > intraday
> > > system that makes on average around 5 futures trades per day. We 
> > were
> > > discussing it the other day and a point of disagreement arose 
> > between
> > > us. He claims that there is no necessity for him to test the 
> > strategy
> > > on out of sample data because he has back tested it using over 8 
> > years
> > > of historical intraday data, and the patterns the strategy 
> predicts
> > > occur 70% of the time or more.
> > > 
> > > My question is, does anyone know if the data-mining bias can be
> > > considered irrelvant when the sample size is so large? (in this 
> > case,
> > > the sample size is roughly 8400 trades). Put another way, with so 
> > many
> > > observations, how many different rules would have to be back 
> tested 
> > in
> > > order for data-mining bias to creep in?
> > > 
> > > Thanks in advance for any thoughts you might have!
> > > 
> > > Simon
> > >
> >
>

[amibroker] Re: Data mining bias vs number of observations

Reply via email to