Thanks for your question. It is a good, and necessary, thing to question new ideas.
No, you haven't misunderstood the implications of what I am saying. First, to put it in context: I am not commenting on Walk-Forward since I am not comfortable with it and I don't have the experience anyway (possibly the reason I am not comfortable with it). I am referencing Fred and Howard to gain some insight into that area myself (to me training our systems 'on the fly' seems like a separate trading style to my own). I am looking in another direction. I am specifically 'researching' the grounds for deciding what metric to use when we get to the point of choosing our 'Objective Function', or as Fred calls it setting our 'Fitness, Goals and Constraints'; a decision that we have to make whenever we backtest, irrespective of the particular method we use (OOS, multiple OOS, Walk-Forward etc). My comments are based on observations that I have made, using Excel spreadsheets to simulate the null hypothesis i.e. that the markets are a random walk and therefore all trading systems will revert to 0 mathematical expectancy over time. Luckily for me, those investigations uncovered a lot more than I originally bargained for - and yes it does have wider implications (if I am correct). Some could argue that 'synthetic' equity curves, based on RandomlyGeneratedNumbers (RGN's) are not real, with regards to trading. Well IMO they are a real simulation of the null hypothesis (give or take a bit of inaccuracy) and that we can learn a lot about evaluation from them simply because when can 'stress test' known evaluation tools (concepts, equations, metrics etc) against data with known W/L, Payoff and ProfitFactor ratios etc. My argument is that the above metrics (binomial factors) are the key inputs that drive equity outcomes and therefore how accurately we can predict their values, when using known 0 expectancy data, reflects how functional/pragmatic our evaluation techniques are. The observations I have made allow me to gain faith in some methods, lose faith in others and develop a few new ones of my own (in the very slippery world of evaluation, faith is a priceless commodity to me). Yes, it is difficult for me to take it all on board, let alone anyone else who hasn't had the benefit of working through all of the 'bench tests'. So, the implications are wider, but to answer your question I will focus on one aspect of my investigations i.e. sample error. I will also limit the discussion on sample error to the basics (sample error is rather pervasive and has one or two surprizing twists in the tail but I won't go into all of the nuances in this post). Keep in mind, that my intention is basically to 'share' my work by asking people to think about it. I am satisfied that a few are finding it interesting and stimulating. Applications are entirely up to the individual. Re sample error: I have added a graph to the K-Ratio_v2.xls file that is in the file section of this group. I have plotted the progressive W/L ratio for 1000 trades (W/L plots are one place where sample error is made blatantly obvious). F9 will force a recalc of the plot. (Some people might be uncomfortable with the fact that I have used the uniform distribution format of the underlying RGN's to produce the 'synthetic' data but I can assure them I have done my homework with various distributions and the answer is the same). Note that the W/L ratio, for the null hypothesis, is known to us in advance i.e. it is equal to 1/1 (this is with the default setting of Bias == 0.5, Volatility == 1 and the % factor as either 10 or 100 - DO NOT CHANGE THE %FACTOR TO 1) The first thing you will notice is that the beginning of the plot is 'wild' and deviates a long way from the known value for the first approx 100 datapoints (this is predicted by the sample error equation == 10% for N == 100). >From observations I have made in other Excel benchtests I predict that the aritmetic mean of a number of trials (equity curves) will be very close to 1.0 and that the StDev of the final W/L ratio, in successive trials, will be 2*the sample error% == 2 * 3.2% (the test uses 1000 datapoints in total). So, as F9 is repeatedly pressed, new plots will be created. >From N == 1 to around 300/400 the W/L ratio will be 'wild' then it will start to smooth out (statistical smoothing takes effect) and around 60% of the time the final W/L ratio will be within +- 1 StDev but around 1 in 100 times it will exceed 3 StDevs either way. This is an inescapable fact. Individually, we have to decide whether to ignore this or figure when and where to use it. If we look at the plot, and also consider sample error for all N datapoints, we can easily see we have to choose a value for N somewhere above 100 (too wild below that) and somewhere below 1400 because the gain of lower %error is outweighed by the consumption of valueable data (I am assuming here that we are all data challenged). The choice we make is always a trade off between accuracy (statistical validity) and data consumption. Note that above approx 1400 we are only decreasing sample error by the 4th decimal place for every extra datapoint we use OR to put that another way, error% is around 2.5 at 1400N and 1.0 at 10,000N, so we haven't gained that much accuracy for the additional 8600N consumed. For utility purposes (pragmatic application) - if we are 100% objective traders then we accept Fred's and Howards opinion that "there is no substitute for OOS testing" so we need at least two samples that generate enough trades to pass our personal optimumN. If we are EOD traders and use indicators with long lookback periods coupled with relatively rare signals then we might need a very large number of bars to generate our minimum number of trades (*2 for IS and OOS samples). I don't know how others respond to the 'N facts of life' but it definitely influenced the way I trade, especially the frequency with which I trade. Yes, every situation is unique based on the number of data bars available/average time in trade/average time waiting for a new signal etc (tick traders, the kings of data affluence, have at least ticks/minute*60*6 more 'bars' to play with than EOD traders). IMO data is scarce for long term traders, and it is soon consumed. That is why we 'instinctively' tend to compromise by lowering our minimal N requirements. I think that answers your question. Naturally I went passed a lot of interesting side trails, in the interests of brevity (I can't write it and you cant read it all in one big bite). cheers, brian_z PS - to explain this stuff does require the use of Excel examples. The K-ratio file is on this site because the UKB was offline when I first posted it - it is not a political statement. If I do post more, on this and related subjects, I am now unlikely to use the UKB as the vehicle - that isn't a political statement either. I am just considering my own creative well being and like all artists I prefer having control of my canvas/workspace. I am likely to continue occassional posting to the Data/DatabaseManagement categories at the UKB and move my original stats work elsewhere (I haven't made a final decision yet). I am --- In [email protected], Thomas Ludwig <[EMAIL PROTECTED]> wrote: > > Brian, > > your post is very interesting (as always) - but I'm puzzled! Perhaps I > simply misunderstood. > > E.g., you wrote: > > > Here are some rules from my notebook: > > > > - good data, relevant to current conditions, is scarce. Why waste it? > > - sample error is real > > - around 300 to 400 trades is the minimum, with no further > > substantial minimization of sample error beyond, around 10,000 > > - there is a sweet spot around 1,000 - 5,000 trades > > - if data is short then work with no less than 3-400 > > - if data is in plentiful supply (intraday?) then use more > > Quite frankly, I'm not getting it. You say that the sweet spot is around > 1.000 - 5.000 trades (I assume for the IS period). So let's say for > simplicity, 1.000 trades minimum are desirable if you have enough data. > But what is enough data? As I haven't traded intraday so far I can't > answer this question for that style of trading. I'm trading daily > systems. Now let's assume that I have 10 years of daily data (would you > call that plentiful?). 1.000 trades mean 100 trades per year on average > or (if we assume 200 trading days by rule of thumb) one trade every > second day. Do your rules mean that an EOD system that doesn't produce > a trade at least every second day isn't testable/tradeable? And I'm > only talking about the IS period. What about OOS and walk-forward - > would I need, say, 20 years or data in your opinion to have enough data > for them? > > Again, I assume that I simply misunderstood. Perhaps you were talking > about a system that trades a large basket of stocks in order to achieve > this large number of trades? > > I'm really interested in your answer since your posts are always full of > hints worth to think about. > > Best regards, > > Thomas >
