Hi Pedro,

Doing Monte Carlo simulation is a great idea for multi-steps. I guess one
concern is that the number of possibilities grows exponentially the longer
you look into the future. The simulation time will similarly grow
exponentially. Still, for a small number of steps it could work well.

For predicting peak load, I think your current approach is pretty good. The
big drawback as you mentioned is that it reduces the number of data points
by a factor of 48. How much data do you have? Internally we use a rule of
thumb where we like to have at least a thousand records to get decent
results.

The other possible approach is to create a 48-step ahead model and feed it
half hour data (swarm on this configuration if possible). Then you can
accumulate the predictions as you go along. So, by midnight Tuesday, you
should have all the predictions for Wednesday and you can take the peak
one.  This will allow you to use all the data. You can use the same
approach for 2 days ahead, etc. I'm not actually sure if this will do
better than your approach, but thought I'd throw it out there.

--Subutai



On Fri, Oct 4, 2013 at 6:04 AM, Pedro Tabacof <[email protected]> wrote:

> Hello Subutai,
>
> Since it was quite easy to do, I ended up trying to feed back the
> prediction back to the input. While the results were worse than doing
> 31-step or 1,...,31-step predicitons, it wasn't terrible. Like you said,
> the simulation degraded with time, but in the end it was still within an
> acceptable range. Maybe it'd be interesting to research this problem under
> a Monte Carlo approach, repeating the simulation many times using different
> predictions and calculating the final prediciton expectation.
>
> I raised this question because on this problem I have to predict the max
> energy load of each day, however I have half-hourly data, so I'm actually
> discarding a lot of samples to feed to the CLA just the max load of each
> day. My idea is to use the half-hourly data and then do this prediction
> feedback so I can predict the half-hourly energy load for the whole month,
> and then I can take the max load of each day by hand. I still haven't done
> this because this is gonna be much more challenging, but it is worth the
> shot even if it is just for "scientific" reasons.
>
> Do you have any ideia on how to use the half-hourly data in a sensible
> way?
>
> Your suggestion to do swarming on 31 different models is great, I was just
> stuck thinking of doing only the 1,...,31-step predictions with one single
> model, but as you said the classifier uses a lot a of memory this way and
> ends up being much slower than it'd be with separate models. I will try to
> get swarming running on the VM and then try to do this, it seems like the
> best shot for a good result.
>
> Thanks a lot, it was really helpful!
>
> Pedro.
>
>
> On Thu, Oct 3, 2013 at 5:32 PM, Subutai Ahmad <[email protected]> wrote:
>
>> Hi Pedro,
>>
>> That's encouraging news!  Having your results documented will be really
>> helpful to everyone.  Here's an attempt to answer your main question:
>>
>> 1) My feeling is similar to yours - in general I don't think recursively
>> feeding in classifier predictions is a good idea for predicting many steps
>> ahead. There are multiple predictions made at each time step. These
>> predictions branch into the future and weird things can happen. Suppose we
>> fed in the most likely prediction at each time step.  Here's a simple
>> failure case:
>>
>> A  -> B (0.4) -> D (0.1)
>>  |---> C (0.3) -> E (1.0)
>>
>> In this data, after A you get B with 40% chance and C with 30% chance.
>> After B the most likely element is D but it only has 10% chance. E always
>> follows C with 100% probability.  If you feed the most likely prediction
>> from A back into the system, you would predict D two steps ahead. However,
>> E is a better 2-step prediction starting from A.
>>
>> Other issues can happen. Quite often the probabilities for the various
>> predictions are quite similar. If you just follow the most likely path then
>> a small mistake (e.g. a small amount of noise) could throw it off.   If you
>> could somehow feed in all the probabilities at each time step then maybe
>> you can do a better job but that would be a lot more involved and I'm not
>> really sure how to do it with CLA.
>>
>>
>> For multi step predictions we have tried the following options:
>>
>> a) For x=1 .. 31, train 31 different models, each predicting x steps
>> ahead. Each model is swarmed specifically for x.  This gives the best
>> results since the parameters for predicting one month into the future could
>> be different from 1 day into the future. It sounds similar to what you did
>> except for custom swarming. Unfortunately, this is the most time consuming
>> because of the swarming step. Once you get swarming working, you might want
>> to try this with just one 7 step ahead model and see if that is better than
>> your current 7 step model.
>>
>> b) Train one model to predict 31 days ahead and accumulate the results to
>> get all the predictions. So, tomorrow's prediction would have been made 30
>> days ago by this model. Surprisingly, in some situations with very regular
>> data this works pretty well.  Quite often it's not as good as a).
>>
>> c) A combination of the above. For example, train 3 models to predict 1
>> day, 7 days, and 31 days in advance. Accumulate using the closest models.
>> This is a compromise that can work well.
>>
>> d) Train a single model to predict 1, 2, 3, …, 31 steps ahead (i.e. all
>> of them). You can do this by specifying a list of steps for steps ahead.
>> We've had problems with this though.  The classifier can take up a lot of
>> memory in this setup. Also, often a single set of parameters doesn't work
>> well for all time ranges.
>>
>>
>> Other questions:
>>
>> 2) It should. Scott might know better.
>>
>> 3) I don't know - again Scott might know this. If I remember correctly
>> finishLearning is just an optimization step so you can ignore it. Turning
>> learning off with disableLearning should work for testing.
>>
>> 4) Yes, you can run swarming within the VM. The main extra step is that
>> you need to install MySQL. There is a test script in "python
>> examples/swarm/test_db.py" to test that the DB is working. If that works
>> swarming should work. See
>> https://github.com/numenta/nupic/wiki/Running-Swarms for details.
>>
>> This ended up being a really long email!  Hopefully it was helpful.
>>
>> --Subutai
>>
>>
>>
>> On Thu, Oct 3, 2013 at 9:13 AM, Pedro Tabacof <[email protected]> wrote:
>>
>>> Matt, I haven't uploaded my code anywhere yet. I'd like to try more a
>>> few more things (which depend on the questions I asked) before I do this
>>> because I know when I upload the code and post the results here I probably
>>> won't try to improve or change anything. I only work well under pressure
>>> lol.
>>>
>>> Since I'm gonna be away this weekend, I hope that by the end of next
>>> week I will set up a github page with everything (explanation of the
>>> problem, dataset, code and results with competition comparisons).
>>>
>>> Pedro.
>>>
>>>
>>> On Thu, Oct 3, 2013 at 12:56 PM, Matthew Taylor <[email protected]>wrote:
>>>
>>>> Pedro, this is exciting! Is your code available online anywhere? Any
>>>> chance you can put it up on github or bitbucket?
>>>>
>>>> ---------
>>>> Matt Taylor
>>>> OS Community Flag-Bearer
>>>> Numenta
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 6:59 AM, Pedro Tabacof <[email protected]>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I've been working with an energy competition dataset [1] and I've been
>>>>> experimenting with some different ways to predict many steps ahead (I have
>>>>> to predict 31 different energy loads for the whole month). This led me to
>>>>> some questions:
>>>>>
>>>>> 1) Has anyone tried feeding one-step classifier predictions back to
>>>>> the input? This can be done easily by hand but I'm not sure if this is a
>>>>> good idea for many steps prediction.
>>>>>
>>>>> 2) Does "disableLearning" also turn off classifier learning? If not,
>>>>> how do I do it?
>>>>>
>>>>> 3) Is "finishLearning" deprecated? I tried using it but I got an error
>>>>> message.
>>>>>
>>>>> 4) Is it possible run swarming within the Vagrant VM? What about
>>>>> Cerebro?
>>>>>
>>>>> On a side note, so far I have achieved 3.3% MAPE on the test data,
>>>>> which would put me among the top 10 competitors (out of 26), with pretty
>>>>> much the basic NuPIC configuration, very similar to the hotgym example.
>>>>>
>>>>> I have experimented with 31-step predictions and 1,2,3,...,31
>>>>> predictions, but this was too slow and didn't improve the results. When I
>>>>> finish testing all my ideas, I will post my results and experience here.
>>>>>
>>>>> Pedro.
>>>>>
>>>>> [1] http://neuron.tuke.sk/competition/index.php
>>>>> --
>>>>> Pedro Tabacof,
>>>>> Unicamp - Eng. de Computação 08.
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Pedro Tabacof,
>>> Unicamp - Eng. de Computação 08.
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
> Pedro Tabacof,
> Unicamp - Eng. de Computação 08.
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to