[amibroker] Re: Data mining bias vs number of observations

brian_z111 Tue, 22 Apr 2008 15:28:33 -0700

Thanks for your question.

It is a good, and necessary, thing to question new ideas.

No, you haven't misunderstood the implications of what I am saying.

First, to put it in context:

I am not commenting on Walk-Forward since I am not comfortable with 
it and I don't have the experience anyway (possibly the reason I am 
not comfortable with it).

I am referencing Fred and Howard to gain some insight into that area 
myself (to me training our systems 'on the fly' seems like a separate 
trading style to my own).

I am looking in another direction.

I am specifically 'researching' the grounds for deciding what metric 
to use when we get to the point of choosing our 'Objective Function', 
or as Fred calls it setting our 'Fitness, Goals and Constraints'; a 
decision that we have to make whenever we backtest, irrespective of 
the particular method we use (OOS, multiple OOS, Walk-Forward etc).

My comments are based on observations that I have made, using Excel 
spreadsheets to simulate the null hypothesis i.e. that the markets 
are a random walk and therefore all trading systems will revert to 0 
mathematical expectancy over time.

Luckily for me, those investigations uncovered a lot more than I 
originally bargained for - and yes it does have wider implications 
(if I am correct).

Some could argue that 'synthetic' equity curves, based on 
RandomlyGeneratedNumbers (RGN's) are not real, with regards to 
trading.

Well IMO they are a real simulation of the null hypothesis (give or 
take a bit of inaccuracy) and that we can learn a lot about 
evaluation from them simply because when can 'stress test' known 
evaluation tools (concepts, equations, metrics etc) against data with 
known W/L, Payoff and ProfitFactor ratios etc.

My argument is that the above metrics (binomial factors) are the key 
inputs that drive equity outcomes and therefore how accurately we can 
predict their values, when using known 0 expectancy data, reflects 
how functional/pragmatic our evaluation techniques are.

The observations I have made allow me to gain faith in some methods, 
lose faith in others and develop a few new ones of my own (in the 
very slippery world of evaluation, faith is a priceless commodity to 
me).

Yes, it is difficult for me to take it all on board, let alone anyone 
else who hasn't had the benefit of working through all of the 'bench 
tests'.

So, the implications are wider, but to answer your question I will 
focus on one aspect of my investigations i.e. sample error.
I will also limit the discussion on sample error to the basics 
(sample error is rather pervasive and has one or two surprizing 
twists in the tail but I won't go into all of the nuances in this 
post).

Keep in mind, that my intention is basically to 'share' my work by 
asking people to think about it.

I am satisfied that a few are finding it interesting and stimulating.

Applications are entirely up to the individual.

Re sample error:

I have added a graph to the K-Ratio_v2.xls file that is in the file 
section of this group.

I have plotted the progressive W/L ratio for 1000 trades (W/L plots 
are one place where sample error is made blatantly obvious). 

F9 will force a recalc of the plot.

(Some people might be uncomfortable with the fact that I have used 
the uniform distribution format of the underlying RGN's to produce 
the 'synthetic' data but I can assure them I have done my homework 
with various distributions and the answer is the same).

Note that the W/L ratio, for the null hypothesis, is known to us in 
advance i.e. it is equal to 1/1 (this is with the default setting of 
Bias == 0.5, Volatility == 1 and the % factor as either 10 or 100 - 
DO NOT CHANGE THE %FACTOR TO 1)

The first thing you will notice is that the beginning of the plot 
is 'wild' and deviates a long way from the known value for the first 
approx 100 datapoints (this is predicted by the sample error equation 
== 10% for N == 100).

>From observations I have made in other Excel benchtests I predict 
that the aritmetic mean of a number of trials (equity curves) will be 
very close to 1.0 and that the StDev of the final W/L ratio, in 
successive trials, will be 2*the sample error% == 2 * 3.2% (the test 
uses 1000 datapoints in total).

So, as F9 is repeatedly pressed, new plots will be created.
>From N == 1 to around 300/400 the W/L ratio will be 'wild' then it 
will start to smooth out (statistical smoothing takes effect) and 
around 60% of the time the final W/L ratio will be within +- 1 StDev 
but around 1 in 100 times it will exceed 3 StDevs either way.

This is an inescapable fact.

Individually, we have to decide whether to ignore this or figure when 
and where to use it.

If we look at the plot, and also consider sample error for all N 
datapoints, we can easily see we have to choose a value for N 
somewhere above 100 (too wild below that) and somewhere below 1400 
because the gain of lower %error is outweighed by the consumption of 
valueable data (I am assuming here that we are all data challenged).

The choice we make is always a trade off between accuracy 
(statistical validity) and data consumption.

Note that above approx 1400 we are only decreasing sample error by 
the 4th decimal place for every extra datapoint we use OR to put that 
another way, error% is around 2.5 at 1400N and 1.0 at 10,000N, so we 
haven't gained that much accuracy for the additional 8600N consumed.

For utility purposes (pragmatic application) - if we are 100% 
objective traders then we accept Fred's and Howards opinion 
that "there is no substitute for OOS testing" so we need at least two 
samples that generate enough trades to pass our personal optimumN.

If we are EOD traders and use indicators with long lookback periods 
coupled with relatively rare signals then we might need a very large 
number of bars to generate our minimum number of trades (*2 for IS 
and OOS samples).

I don't know how others respond to the 'N facts of life' but it 
definitely influenced the way I trade, especially the frequency with 
which I trade.

Yes, every situation is unique based on the number of data bars 
available/average time in trade/average time waiting for a new signal 
etc (tick traders, the kings of data affluence, have at least 
ticks/minute*60*6 more 'bars' to play with than EOD traders).

IMO data is scarce for long term traders, and it is soon consumed.

That is why we 'instinctively' tend to compromise by lowering our 
minimal N requirements.

I think that answers your question.

Naturally I went passed a lot of interesting side trails, in the 
interests of brevity (I can't write it and you cant read it all in 
one big bite).

cheers,

brian_z

PS - to explain this stuff does require the use of Excel examples.

The K-ratio file is on this site because the UKB was offline when I 
first posted it - it is not a political statement.

If I do post more, on this and related subjects, I am now unlikely to 
use the UKB as the vehicle - that isn't a political statement either.

I am just considering my own creative well being and like all artists 
I prefer having control of my canvas/workspace.

I am likely to continue occassional posting to the 
Data/DatabaseManagement categories at the UKB and move my original 
stats work elsewhere (I haven't made a final decision yet).

I am  --- In [email protected], Thomas Ludwig 
<[EMAIL PROTECTED]> wrote:
>
> Brian,
> 
> your post is very interesting (as always) - but I'm puzzled! 
Perhaps I 
> simply misunderstood.
> 
> E.g., you wrote:
> 
> > Here are some rules from my notebook:
> >
> > - good data, relevant to current conditions, is scarce. Why waste 
it?
> > - sample error is real
> > - around 300 to 400 trades is the minimum, with no further
> > substantial minimization of sample error beyond, around 10,000
> > - there is a sweet spot around 1,000 - 5,000 trades
> > - if data is short then work with no less than 3-400
> > - if data is in plentiful supply (intraday?) then use more
> 
> Quite frankly, I'm not getting it. You say that the sweet spot is 
around 
> 1.000 - 5.000 trades (I assume for the IS period). So let's say for 
> simplicity, 1.000 trades minimum are desirable if you have enough 
data. 
> But what is enough data? As I haven't traded intraday so far I 
can't 
> answer this question for that style of trading. I'm trading daily 
> systems. Now let's assume that I have 10 years of daily data (would 
you 
> call that plentiful?). 1.000 trades mean 100 trades per year on 
average 
> or (if we assume 200 trading days by rule of thumb) one trade every 
> second day. Do your rules mean that an EOD system that doesn't 
produce 
> a trade at least every second day isn't testable/tradeable? And I'm 
> only talking about the IS period. What about OOS and walk-forward - 
> would I need, say, 20 years or data in your opinion to have enough 
data 
> for them?
> 
> Again, I assume that I simply misunderstood. Perhaps you were 
talking 
> about a system that trades a large basket of stocks in order to 
achieve 
> this large number of trades?
> 
> I'm really interested in your answer since your posts are always 
full of 
> hints worth to think about.
> 
> Best regards,
> 
> Thomas
>

[amibroker] Re: Data mining bias vs number of observations

Reply via email to