Doing the same thing on two different sets of data seems identical to doing
it on one combined set of data.  How is it different?

On Tue, Dec 7, 2010 at 4:14 AM, Astor <[email protected]> wrote:

> The "in-sample" set is where you develop your model and optimize your
> parameters. Because optimization searches through a very large number of
> possible parameter values, it finds those values which best fit the data*in 
> this set.
> * In a different data set, such as the one that may occur in real trading,
> these parameters may prove perfectly useless. In Quant research, such
> situation is (derogatively) referred to as "datamining" or overfitting. With
> enough model parameters and extensive optimization, I can get perfect
> accuracy predicting "in-sample" lottery winners. Of course that model will
> not work to predict next, "out-of-sample", lottery winner.
>
> The "out-of-sample" set is a way to verify that the found model and its
> parameters are general instead of unique to the "in-sample" development set.
> Combining the two sets into a single set defeats that purpose.
>
>  ------------------------------
> *From:* ShaggsTheStud <[email protected]>
> *To:* [email protected]
> *Sent:* Mon, December 6, 2010 10:21:59 PM
> *Subject:* Re: [JBookTrader] Re: Dynamic Parameter Optimization
>
> That whole "in sample" and "out of sample" data thing strikes me very as
> very odd. If it works on the in-sample and not the out-sample, its going to
> have a bad distribution as a single set, so why not just combine it?
>
> On Sun, Dec 5, 2010 at 5:56 AM, Astor <[email protected]> wrote:
>
>>   > we would
>> >be required to significantly shorten our optimization periods, thus
>> >incurring a penalty of standard error in our confidence bands.
>>
>> I understand your concern Eugene. However, it is important to recognize
>> that in strategy development and validation there are two sets of data and
>> two sets of confidence bands.  First set is used for strategy development
>> and parameter optimization and is often called "in-sample". The second set
>> is used only to validate the strategy performance and is called
>> "out-of-sample".
>>
>> If the confidence interval is very broad (standard error is large) in the
>> "in-sample" data, your strategy is not reliable and should not be used.
>>
>> If the "in-sample" results are good and have acceptable confidence
>> intervals, the next step is validation of the strategy on "out-of-sample"
>> data. Because "out-of-sample" data has not been used for parameter
>> optimization, the results obtained on this data are far more important than
>> those from "in-sample". If the "out-of-sample confidence interval is too
>> broad, the validation results are not reliable and the strategy should not
>> be used.
>>
>> It is extremely common that the available data set is too small to
>> partition the data into  in- and out- of sample sets of adequate size.
>> In financial research, the data set size is usually limited not by the data
>> availability but by the data stationarity. To create valid sample sizes from
>> small data, a technique called "leave-one-out" or "bootstrapping" or
>> "jackknifing" is used. In those techniques the model is developed on the
>> entire data except for one "holdout" point, then tested on this point. Then
>> a different point is selected and the process is repeated. The validation
>> results are obtained by combining the results of holdout points.
>> Walk-forward optimization is an example of this technique and actually
>> reduces standard error in the more important "out-of-sample" test.
>>
>> >better model would be the one which not only
>> >accounts for the supply/demand, but also for its changing elasticity
>> >over time
>>
>> That is definitely so and is often driven by seasonality as well as regime
>> shifts. For futures, such as ES, the elasticity could drift in response to
>> the proximity of the expiration date or as a result of changing market
>> sentiment or increased trading in spot or in "dark pools", which impacts
>> demand but is not reflected in bid/ask quotes.
>>
>> >the manner in which its parameters change overtime is not intuitive at
>> >all
>>
>> If the value of the parameters themselves is not intuitive, then its
>> change over time is very likely not to be intuitive as well and vice versa.
>> Most non-intuitive parameter changes happen when the optimization surface is
>> very flat or has many local maxima. Then a minor change in the data can put
>> you into a very different local maxima and cause very unsettling parameter
>> jumps. That is why restricting the optimization region to the vicinity of
>> the most recent parameter values allows for parameters to only drift
>> gradually. Then trends in parameter changes can be spotted and understood
>> intuitively.
>>
>>  ------------------------------
>> *From:* nonlinear5 <[email protected]>
>> *To:* JBookTrader <[email protected]>
>> *Sent:* Sat, December 4, 2010 11:34:20 PM
>> *Subject:* [JBookTrader] Re: Dynamic Parameter Optimization
>>
>> > Eugene, your comment goes to the need to have sufficiently large
>> backtest
>> > database relative to the number of adjustable parameters, so that the
>> results
>> > are statistically significant. How does that relate to potential
>> > non-stationarity of parameters?
>>
>> The non-stationarity of parameters is a problem, indeed. However, some
>> things are more or less absolute. Think of the supply/demand
>> relationship. If you can capture its essence in the strategy, that
>> should work today, tomorrow, and 10 years in the future. Now, I do
>> acknowledge that a better model would be the one which not only
>> accounts for the supply/demand, but also for its changing elasticity
>> over time. However, such model would be more complex, more difficult
>> to understand, and more time-consuming to test. Perhaps more
>> importantly, while the supply/demand law by itself is quite intuitive,
>> the manner in which its parameters change overtime is not intuitive at
>> all. The best we can hope for in our walk-forward optimization is that
>> whatever parameters were the "optimal" in a recent period would still
>> be the optimal in the next period. For the sake of this hope, we would
>> be required to significantly shorten our optimization periods, thus
>> incurring a penalty of standard error in our confidence bands.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "JBookTrader" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to jbooktrader+
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/jbooktrader?hl=en.
>>
>>
>>   --
>> You received this message because you are subscribed to the Google Groups
>> "JBookTrader" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<jbooktrader%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/jbooktrader?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "JBookTrader" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<jbooktrader%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/jbooktrader?hl=en.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "JBookTrader" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<jbooktrader%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/jbooktrader?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"JBookTrader" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jbooktrader?hl=en.

Reply via email to