Re: [nupic-dev] Many-steps prediction

Pedro Tabacof Sat, 05 Oct 2013 15:58:25 -0700

Just a quick update, I've managed to set the timestamp field with the same
format as the hotgym example, but now I'm getting this error:


Model Exception: Exception occurred while running model 1146:
Exception(u'No such input field: load'
,) (<type 'exceptions.Exception'>)

Pedro.


On Sat, Oct 5, 2013 at 7:36 PM, Pedro Tabacof <[email protected]> wrote:

> Hello Subutai,
>
> I've two years worth of data, so that means 730 max loads and 35040
> half-hourly loads. Besides only using 730 samples, another problem is that
> the data is highly "seasoned": the competition winners actually discarded
> summer data since the prediction target was only January.
>
> I'm having some problems with swarming:
>
> 1) I've tried many different naming schemes but run_swarm.py never finds
> my data file. The only way I managed was to rename my file to "hotgym.csv"
> and use the same path as the "simple" example.
>
> 2) What is the expected datetime format? Is there a way to change it? I
> just cannot set my Excel to write dates as YYYY-MM-DD hh:mm:ss and I'm
> using MM/DD/YYYY.
>
> I don't if it's related to (2), but my swarming fails with:
>
> ERROR MESSAGE: Exception occurred while running model 1139:
> KeyError('load',) (<type 'exceptions.Key Error'>)
>
> ("load" is the prediction objective)
>
>
> Thanks again!
> Pedro.
>
>
>
> On Fri, Oct 4, 2013 at 9:43 PM, Subutai Ahmad <[email protected]> wrote:
>
>> Hi Pedro,
>>
>> Doing Monte Carlo simulation is a great idea for multi-steps. I guess one
>> concern is that the number of possibilities grows exponentially the longer
>> you look into the future. The simulation time will similarly grow
>> exponentially. Still, for a small number of steps it could work well.
>>
>> For predicting peak load, I think your current approach is pretty good.
>> The big drawback as you mentioned is that it reduces the number of data
>> points by a factor of 48. How much data do you have? Internally we use a
>> rule of thumb where we like to have at least a thousand records to get
>> decent results.
>>
>> The other possible approach is to create a 48-step ahead model and feed
>> it half hour data (swarm on this configuration if possible). Then you can
>> accumulate the predictions as you go along. So, by midnight Tuesday, you
>> should have all the predictions for Wednesday and you can take the peak
>> one.  This will allow you to use all the data. You can use the same
>> approach for 2 days ahead, etc. I'm not actually sure if this will do
>> better than your approach, but thought I'd throw it out there.
>>
>> --Subutai
>>
>>
>>
>> On Fri, Oct 4, 2013 at 6:04 AM, Pedro Tabacof <[email protected]> wrote:
>>
>>> Hello Subutai,
>>>
>>> Since it was quite easy to do, I ended up trying to feed back the
>>> prediction back to the input. While the results were worse than doing
>>> 31-step or 1,...,31-step predicitons, it wasn't terrible. Like you said,
>>> the simulation degraded with time, but in the end it was still within an
>>> acceptable range. Maybe it'd be interesting to research this problem under
>>> a Monte Carlo approach, repeating the simulation many times using different
>>> predictions and calculating the final prediciton expectation.
>>>
>>> I raised this question because on this problem I have to predict the max
>>> energy load of each day, however I have half-hourly data, so I'm actually
>>> discarding a lot of samples to feed to the CLA just the max load of each
>>> day. My idea is to use the half-hourly data and then do this prediction
>>> feedback so I can predict the half-hourly energy load for the whole month,
>>> and then I can take the max load of each day by hand. I still haven't done
>>> this because this is gonna be much more challenging, but it is worth the
>>> shot even if it is just for "scientific" reasons.
>>>
>>> Do you have any ideia on how to use the half-hourly data in a sensible
>>> way?
>>>
>>> Your suggestion to do swarming on 31 different models is great, I was
>>> just stuck thinking of doing only the 1,...,31-step predictions with one
>>> single model, but as you said the classifier uses a lot a of memory this
>>> way and ends up being much slower than it'd be with separate models. I will
>>> try to get swarming running on the VM and then try to do this, it seems
>>> like the best shot for a good result.
>>>
>>> Thanks a lot, it was really helpful!
>>>
>>> Pedro.
>>>
>>>
>>> On Thu, Oct 3, 2013 at 5:32 PM, Subutai Ahmad <[email protected]>wrote:
>>>
>>>> Hi Pedro,
>>>>
>>>> That's encouraging news!  Having your results documented will be really
>>>> helpful to everyone.  Here's an attempt to answer your main question:
>>>>
>>>> 1) My feeling is similar to yours - in general I don't think
>>>> recursively feeding in classifier predictions is a good idea for predicting
>>>> many steps ahead. There are multiple predictions made at each time step.
>>>> These predictions branch into the future and weird things can happen.
>>>> Suppose we fed in the most likely prediction at each time step.  Here's a
>>>> simple failure case:
>>>>
>>>> A  -> B (0.4) -> D (0.1)
>>>>  |---> C (0.3) -> E (1.0)
>>>>
>>>> In this data, after A you get B with 40% chance and C with 30% chance.
>>>> After B the most likely element is D but it only has 10% chance. E always
>>>> follows C with 100% probability.  If you feed the most likely prediction
>>>> from A back into the system, you would predict D two steps ahead. However,
>>>> E is a better 2-step prediction starting from A.
>>>>
>>>> Other issues can happen. Quite often the probabilities for the various
>>>> predictions are quite similar. If you just follow the most likely path then
>>>> a small mistake (e.g. a small amount of noise) could throw it off.   If you
>>>> could somehow feed in all the probabilities at each time step then maybe
>>>> you can do a better job but that would be a lot more involved and I'm not
>>>> really sure how to do it with CLA.
>>>>
>>>>
>>>> For multi step predictions we have tried the following options:
>>>>
>>>> a) For x=1 .. 31, train 31 different models, each predicting x steps
>>>> ahead. Each model is swarmed specifically for x.  This gives the best
>>>> results since the parameters for predicting one month into the future could
>>>> be different from 1 day into the future. It sounds similar to what you did
>>>> except for custom swarming. Unfortunately, this is the most time consuming
>>>> because of the swarming step. Once you get swarming working, you might want
>>>> to try this with just one 7 step ahead model and see if that is better than
>>>> your current 7 step model.
>>>>
>>>> b) Train one model to predict 31 days ahead and accumulate the results
>>>> to get all the predictions. So, tomorrow's prediction would have been made
>>>> 30 days ago by this model. Surprisingly, in some situations with very
>>>> regular data this works pretty well.  Quite often it's not as good as a).
>>>>
>>>> c) A combination of the above. For example, train 3 models to predict 1
>>>> day, 7 days, and 31 days in advance. Accumulate using the closest models.
>>>> This is a compromise that can work well.
>>>>
>>>> d) Train a single model to predict 1, 2, 3, …, 31 steps ahead (i.e. all
>>>> of them). You can do this by specifying a list of steps for steps ahead.
>>>> We've had problems with this though.  The classifier can take up a lot of
>>>> memory in this setup. Also, often a single set of parameters doesn't work
>>>> well for all time ranges.
>>>>
>>>>
>>>> Other questions:
>>>>
>>>> 2) It should. Scott might know better.
>>>>
>>>> 3) I don't know - again Scott might know this. If I remember correctly
>>>> finishLearning is just an optimization step so you can ignore it. Turning
>>>> learning off with disableLearning should work for testing.
>>>>
>>>> 4) Yes, you can run swarming within the VM. The main extra step is that
>>>> you need to install MySQL. There is a test script in "python
>>>> examples/swarm/test_db.py" to test that the DB is working. If that works
>>>> swarming should work. See
>>>> https://github.com/numenta/nupic/wiki/Running-Swarms for details.
>>>>
>>>> This ended up being a really long email!  Hopefully it was helpful.
>>>>
>>>> --Subutai
>>>>
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 9:13 AM, Pedro Tabacof <[email protected]>wrote:
>>>>
>>>>> Matt, I haven't uploaded my code anywhere yet. I'd like to try more a
>>>>> few more things (which depend on the questions I asked) before I do this
>>>>> because I know when I upload the code and post the results here I probably
>>>>> won't try to improve or change anything. I only work well under pressure
>>>>> lol.
>>>>>
>>>>> Since I'm gonna be away this weekend, I hope that by the end of next
>>>>> week I will set up a github page with everything (explanation of the
>>>>> problem, dataset, code and results with competition comparisons).
>>>>>
>>>>> Pedro.
>>>>>
>>>>>
>>>>> On Thu, Oct 3, 2013 at 12:56 PM, Matthew Taylor <[email protected]>wrote:
>>>>>
>>>>>> Pedro, this is exciting! Is your code available online anywhere? Any
>>>>>> chance you can put it up on github or bitbucket?
>>>>>>
>>>>>> ---------
>>>>>> Matt Taylor
>>>>>> OS Community Flag-Bearer
>>>>>> Numenta
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 3, 2013 at 6:59 AM, Pedro Tabacof <[email protected]>wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I've been working with an energy competition dataset [1] and I've
>>>>>>> been experimenting with some different ways to predict many steps ahead 
>>>>>>> (I
>>>>>>> have to predict 31 different energy loads for the whole month). This 
>>>>>>> led me
>>>>>>> to some questions:
>>>>>>>
>>>>>>> 1) Has anyone tried feeding one-step classifier predictions back to
>>>>>>> the input? This can be done easily by hand but I'm not sure if this is a
>>>>>>> good idea for many steps prediction.
>>>>>>>
>>>>>>> 2) Does "disableLearning" also turn off classifier learning? If not,
>>>>>>> how do I do it?
>>>>>>>
>>>>>>> 3) Is "finishLearning" deprecated? I tried using it but I got an
>>>>>>> error message.
>>>>>>>
>>>>>>> 4) Is it possible run swarming within the Vagrant VM? What about
>>>>>>> Cerebro?
>>>>>>>
>>>>>>> On a side note, so far I have achieved 3.3% MAPE on the test data,
>>>>>>> which would put me among the top 10 competitors (out of 26), with pretty
>>>>>>> much the basic NuPIC configuration, very similar to the hotgym example.
>>>>>>>
>>>>>>> I have experimented with 31-step predictions and 1,2,3,...,31
>>>>>>> predictions, but this was too slow and didn't improve the results. When 
>>>>>>> I
>>>>>>> finish testing all my ideas, I will post my results and experience here.
>>>>>>>
>>>>>>> Pedro.
>>>>>>>
>>>>>>> [1] http://neuron.tuke.sk/competition/index.php
>>>>>>> --
>>>>>>> Pedro Tabacof,
>>>>>>> Unicamp - Eng. de Computação 08.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pedro Tabacof,
>>>>> Unicamp - Eng. de Computação 08.
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Pedro Tabacof,
>>> Unicamp - Eng. de Computação 08.
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
> Pedro Tabacof,
> Unicamp - Eng. de Computação 08.
>



-- 
Pedro Tabacof,
Unicamp - Eng. de Computação 08.

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Many-steps prediction

Reply via email to