Re: [nupic-dev] Many-steps prediction

Pedro Tabacof Sat, 05 Oct 2013 15:37:20 -0700

Hello Subutai,

I've two years worth of data, so that means 730 max loads and 35040
half-hourly loads. Besides only using 730 samples, another problem is that
the data is highly "seasoned": the competition winners actually discarded
summer data since the prediction target was only January.


I'm having some problems with swarming:

1) I've tried many different naming schemes but run_swarm.py never finds my
data file. The only way I managed was to rename my file to "hotgym.csv" and
use the same path as the "simple" example.

2) What is the expected datetime format? Is there a way to change it? I
just cannot set my Excel to write dates as YYYY-MM-DD hh:mm:ss and I'm
using MM/DD/YYYY.

I don't if it's related to (2), but my swarming fails with:

ERROR MESSAGE: Exception occurred while running model 1139:
KeyError('load',) (<type 'exceptions.Key Error'>)

("load" is the prediction objective)


Thanks again!
Pedro.



On Fri, Oct 4, 2013 at 9:43 PM, Subutai Ahmad <[email protected]> wrote:

> Hi Pedro,
>
> Doing Monte Carlo simulation is a great idea for multi-steps. I guess one
> concern is that the number of possibilities grows exponentially the longer
> you look into the future. The simulation time will similarly grow
> exponentially. Still, for a small number of steps it could work well.
>
> For predicting peak load, I think your current approach is pretty good.
> The big drawback as you mentioned is that it reduces the number of data
> points by a factor of 48. How much data do you have? Internally we use a
> rule of thumb where we like to have at least a thousand records to get
> decent results.
>
> The other possible approach is to create a 48-step ahead model and feed it
> half hour data (swarm on this configuration if possible). Then you can
> accumulate the predictions as you go along. So, by midnight Tuesday, you
> should have all the predictions for Wednesday and you can take the peak
> one.  This will allow you to use all the data. You can use the same
> approach for 2 days ahead, etc. I'm not actually sure if this will do
> better than your approach, but thought I'd throw it out there.
>
> --Subutai
>
>
>
> On Fri, Oct 4, 2013 at 6:04 AM, Pedro Tabacof <[email protected]> wrote:
>
>> Hello Subutai,
>>
>> Since it was quite easy to do, I ended up trying to feed back the
>> prediction back to the input. While the results were worse than doing
>> 31-step or 1,...,31-step predicitons, it wasn't terrible. Like you said,
>> the simulation degraded with time, but in the end it was still within an
>> acceptable range. Maybe it'd be interesting to research this problem under
>> a Monte Carlo approach, repeating the simulation many times using different
>> predictions and calculating the final prediciton expectation.
>>
>> I raised this question because on this problem I have to predict the max
>> energy load of each day, however I have half-hourly data, so I'm actually
>> discarding a lot of samples to feed to the CLA just the max load of each
>> day. My idea is to use the half-hourly data and then do this prediction
>> feedback so I can predict the half-hourly energy load for the whole month,
>> and then I can take the max load of each day by hand. I still haven't done
>> this because this is gonna be much more challenging, but it is worth the
>> shot even if it is just for "scientific" reasons.
>>
>> Do you have any ideia on how to use the half-hourly data in a sensible
>> way?
>>
>> Your suggestion to do swarming on 31 different models is great, I was
>> just stuck thinking of doing only the 1,...,31-step predictions with one
>> single model, but as you said the classifier uses a lot a of memory this
>> way and ends up being much slower than it'd be with separate models. I will
>> try to get swarming running on the VM and then try to do this, it seems
>> like the best shot for a good result.
>>
>> Thanks a lot, it was really helpful!
>>
>> Pedro.
>>
>>
>> On Thu, Oct 3, 2013 at 5:32 PM, Subutai Ahmad <[email protected]>wrote:
>>
>>> Hi Pedro,
>>>
>>> That's encouraging news!  Having your results documented will be really
>>> helpful to everyone.  Here's an attempt to answer your main question:
>>>
>>> 1) My feeling is similar to yours - in general I don't think recursively
>>> feeding in classifier predictions is a good idea for predicting many steps
>>> ahead. There are multiple predictions made at each time step. These
>>> predictions branch into the future and weird things can happen. Suppose we
>>> fed in the most likely prediction at each time step.  Here's a simple
>>> failure case:
>>>
>>> A  -> B (0.4) -> D (0.1)
>>>  |---> C (0.3) -> E (1.0)
>>>
>>> In this data, after A you get B with 40% chance and C with 30% chance.
>>> After B the most likely element is D but it only has 10% chance. E always
>>> follows C with 100% probability.  If you feed the most likely prediction
>>> from A back into the system, you would predict D two steps ahead. However,
>>> E is a better 2-step prediction starting from A.
>>>
>>> Other issues can happen. Quite often the probabilities for the various
>>> predictions are quite similar. If you just follow the most likely path then
>>> a small mistake (e.g. a small amount of noise) could throw it off.   If you
>>> could somehow feed in all the probabilities at each time step then maybe
>>> you can do a better job but that would be a lot more involved and I'm not
>>> really sure how to do it with CLA.
>>>
>>>
>>> For multi step predictions we have tried the following options:
>>>
>>> a) For x=1 .. 31, train 31 different models, each predicting x steps
>>> ahead. Each model is swarmed specifically for x.  This gives the best
>>> results since the parameters for predicting one month into the future could
>>> be different from 1 day into the future. It sounds similar to what you did
>>> except for custom swarming. Unfortunately, this is the most time consuming
>>> because of the swarming step. Once you get swarming working, you might want
>>> to try this with just one 7 step ahead model and see if that is better than
>>> your current 7 step model.
>>>
>>> b) Train one model to predict 31 days ahead and accumulate the results
>>> to get all the predictions. So, tomorrow's prediction would have been made
>>> 30 days ago by this model. Surprisingly, in some situations with very
>>> regular data this works pretty well.  Quite often it's not as good as a).
>>>
>>> c) A combination of the above. For example, train 3 models to predict 1
>>> day, 7 days, and 31 days in advance. Accumulate using the closest models.
>>> This is a compromise that can work well.
>>>
>>> d) Train a single model to predict 1, 2, 3, …, 31 steps ahead (i.e. all
>>> of them). You can do this by specifying a list of steps for steps ahead.
>>> We've had problems with this though.  The classifier can take up a lot of
>>> memory in this setup. Also, often a single set of parameters doesn't work
>>> well for all time ranges.
>>>
>>>
>>> Other questions:
>>>
>>> 2) It should. Scott might know better.
>>>
>>> 3) I don't know - again Scott might know this. If I remember correctly
>>> finishLearning is just an optimization step so you can ignore it. Turning
>>> learning off with disableLearning should work for testing.
>>>
>>> 4) Yes, you can run swarming within the VM. The main extra step is that
>>> you need to install MySQL. There is a test script in "python
>>> examples/swarm/test_db.py" to test that the DB is working. If that works
>>> swarming should work. See
>>> https://github.com/numenta/nupic/wiki/Running-Swarms for details.
>>>
>>> This ended up being a really long email!  Hopefully it was helpful.
>>>
>>> --Subutai
>>>
>>>
>>>
>>> On Thu, Oct 3, 2013 at 9:13 AM, Pedro Tabacof <[email protected]> wrote:
>>>
>>>> Matt, I haven't uploaded my code anywhere yet. I'd like to try more a
>>>> few more things (which depend on the questions I asked) before I do this
>>>> because I know when I upload the code and post the results here I probably
>>>> won't try to improve or change anything. I only work well under pressure
>>>> lol.
>>>>
>>>> Since I'm gonna be away this weekend, I hope that by the end of next
>>>> week I will set up a github page with everything (explanation of the
>>>> problem, dataset, code and results with competition comparisons).
>>>>
>>>> Pedro.
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 12:56 PM, Matthew Taylor <[email protected]>wrote:
>>>>
>>>>> Pedro, this is exciting! Is your code available online anywhere? Any
>>>>> chance you can put it up on github or bitbucket?
>>>>>
>>>>> ---------
>>>>> Matt Taylor
>>>>> OS Community Flag-Bearer
>>>>> Numenta
>>>>>
>>>>>
>>>>> On Thu, Oct 3, 2013 at 6:59 AM, Pedro Tabacof <[email protected]>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've been working with an energy competition dataset [1] and I've
>>>>>> been experimenting with some different ways to predict many steps ahead 
>>>>>> (I
>>>>>> have to predict 31 different energy loads for the whole month). This led 
>>>>>> me
>>>>>> to some questions:
>>>>>>
>>>>>> 1) Has anyone tried feeding one-step classifier predictions back to
>>>>>> the input? This can be done easily by hand but I'm not sure if this is a
>>>>>> good idea for many steps prediction.
>>>>>>
>>>>>> 2) Does "disableLearning" also turn off classifier learning? If not,
>>>>>> how do I do it?
>>>>>>
>>>>>> 3) Is "finishLearning" deprecated? I tried using it but I got an
>>>>>> error message.
>>>>>>
>>>>>> 4) Is it possible run swarming within the Vagrant VM? What about
>>>>>> Cerebro?
>>>>>>
>>>>>> On a side note, so far I have achieved 3.3% MAPE on the test data,
>>>>>> which would put me among the top 10 competitors (out of 26), with pretty
>>>>>> much the basic NuPIC configuration, very similar to the hotgym example.
>>>>>>
>>>>>> I have experimented with 31-step predictions and 1,2,3,...,31
>>>>>> predictions, but this was too slow and didn't improve the results. When I
>>>>>> finish testing all my ideas, I will post my results and experience here.
>>>>>>
>>>>>> Pedro.
>>>>>>
>>>>>> [1] http://neuron.tuke.sk/competition/index.php
>>>>>> --
>>>>>> Pedro Tabacof,
>>>>>> Unicamp - Eng. de Computação 08.
>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pedro Tabacof,
>>>> Unicamp - Eng. de Computação 08.
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>> Pedro Tabacof,
>> Unicamp - Eng. de Computação 08.
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 
Pedro Tabacof,
Unicamp - Eng. de Computação 08.

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Many-steps prediction

Reply via email to