Just a quick update, I've managed to set the timestamp field with the same format as the hotgym example, but now I'm getting this error:
Model Exception: Exception occurred while running model 1146: Exception(u'No such input field: load' ,) (<type 'exceptions.Exception'>) Pedro. On Sat, Oct 5, 2013 at 7:36 PM, Pedro Tabacof <[email protected]> wrote: > Hello Subutai, > > I've two years worth of data, so that means 730 max loads and 35040 > half-hourly loads. Besides only using 730 samples, another problem is that > the data is highly "seasoned": the competition winners actually discarded > summer data since the prediction target was only January. > > I'm having some problems with swarming: > > 1) I've tried many different naming schemes but run_swarm.py never finds > my data file. The only way I managed was to rename my file to "hotgym.csv" > and use the same path as the "simple" example. > > 2) What is the expected datetime format? Is there a way to change it? I > just cannot set my Excel to write dates as YYYY-MM-DD hh:mm:ss and I'm > using MM/DD/YYYY. > > I don't if it's related to (2), but my swarming fails with: > > ERROR MESSAGE: Exception occurred while running model 1139: > KeyError('load',) (<type 'exceptions.Key Error'>) > > ("load" is the prediction objective) > > > Thanks again! > Pedro. > > > > On Fri, Oct 4, 2013 at 9:43 PM, Subutai Ahmad <[email protected]> wrote: > >> Hi Pedro, >> >> Doing Monte Carlo simulation is a great idea for multi-steps. I guess one >> concern is that the number of possibilities grows exponentially the longer >> you look into the future. The simulation time will similarly grow >> exponentially. Still, for a small number of steps it could work well. >> >> For predicting peak load, I think your current approach is pretty good. >> The big drawback as you mentioned is that it reduces the number of data >> points by a factor of 48. How much data do you have? Internally we use a >> rule of thumb where we like to have at least a thousand records to get >> decent results. >> >> The other possible approach is to create a 48-step ahead model and feed >> it half hour data (swarm on this configuration if possible). Then you can >> accumulate the predictions as you go along. So, by midnight Tuesday, you >> should have all the predictions for Wednesday and you can take the peak >> one. This will allow you to use all the data. You can use the same >> approach for 2 days ahead, etc. I'm not actually sure if this will do >> better than your approach, but thought I'd throw it out there. >> >> --Subutai >> >> >> >> On Fri, Oct 4, 2013 at 6:04 AM, Pedro Tabacof <[email protected]> wrote: >> >>> Hello Subutai, >>> >>> Since it was quite easy to do, I ended up trying to feed back the >>> prediction back to the input. While the results were worse than doing >>> 31-step or 1,...,31-step predicitons, it wasn't terrible. Like you said, >>> the simulation degraded with time, but in the end it was still within an >>> acceptable range. Maybe it'd be interesting to research this problem under >>> a Monte Carlo approach, repeating the simulation many times using different >>> predictions and calculating the final prediciton expectation. >>> >>> I raised this question because on this problem I have to predict the max >>> energy load of each day, however I have half-hourly data, so I'm actually >>> discarding a lot of samples to feed to the CLA just the max load of each >>> day. My idea is to use the half-hourly data and then do this prediction >>> feedback so I can predict the half-hourly energy load for the whole month, >>> and then I can take the max load of each day by hand. I still haven't done >>> this because this is gonna be much more challenging, but it is worth the >>> shot even if it is just for "scientific" reasons. >>> >>> Do you have any ideia on how to use the half-hourly data in a sensible >>> way? >>> >>> Your suggestion to do swarming on 31 different models is great, I was >>> just stuck thinking of doing only the 1,...,31-step predictions with one >>> single model, but as you said the classifier uses a lot a of memory this >>> way and ends up being much slower than it'd be with separate models. I will >>> try to get swarming running on the VM and then try to do this, it seems >>> like the best shot for a good result. >>> >>> Thanks a lot, it was really helpful! >>> >>> Pedro. >>> >>> >>> On Thu, Oct 3, 2013 at 5:32 PM, Subutai Ahmad <[email protected]>wrote: >>> >>>> Hi Pedro, >>>> >>>> That's encouraging news! Having your results documented will be really >>>> helpful to everyone. Here's an attempt to answer your main question: >>>> >>>> 1) My feeling is similar to yours - in general I don't think >>>> recursively feeding in classifier predictions is a good idea for predicting >>>> many steps ahead. There are multiple predictions made at each time step. >>>> These predictions branch into the future and weird things can happen. >>>> Suppose we fed in the most likely prediction at each time step. Here's a >>>> simple failure case: >>>> >>>> A -> B (0.4) -> D (0.1) >>>> |---> C (0.3) -> E (1.0) >>>> >>>> In this data, after A you get B with 40% chance and C with 30% chance. >>>> After B the most likely element is D but it only has 10% chance. E always >>>> follows C with 100% probability. If you feed the most likely prediction >>>> from A back into the system, you would predict D two steps ahead. However, >>>> E is a better 2-step prediction starting from A. >>>> >>>> Other issues can happen. Quite often the probabilities for the various >>>> predictions are quite similar. If you just follow the most likely path then >>>> a small mistake (e.g. a small amount of noise) could throw it off. If you >>>> could somehow feed in all the probabilities at each time step then maybe >>>> you can do a better job but that would be a lot more involved and I'm not >>>> really sure how to do it with CLA. >>>> >>>> >>>> For multi step predictions we have tried the following options: >>>> >>>> a) For x=1 .. 31, train 31 different models, each predicting x steps >>>> ahead. Each model is swarmed specifically for x. This gives the best >>>> results since the parameters for predicting one month into the future could >>>> be different from 1 day into the future. It sounds similar to what you did >>>> except for custom swarming. Unfortunately, this is the most time consuming >>>> because of the swarming step. Once you get swarming working, you might want >>>> to try this with just one 7 step ahead model and see if that is better than >>>> your current 7 step model. >>>> >>>> b) Train one model to predict 31 days ahead and accumulate the results >>>> to get all the predictions. So, tomorrow's prediction would have been made >>>> 30 days ago by this model. Surprisingly, in some situations with very >>>> regular data this works pretty well. Quite often it's not as good as a). >>>> >>>> c) A combination of the above. For example, train 3 models to predict 1 >>>> day, 7 days, and 31 days in advance. Accumulate using the closest models. >>>> This is a compromise that can work well. >>>> >>>> d) Train a single model to predict 1, 2, 3, …, 31 steps ahead (i.e. all >>>> of them). You can do this by specifying a list of steps for steps ahead. >>>> We've had problems with this though. The classifier can take up a lot of >>>> memory in this setup. Also, often a single set of parameters doesn't work >>>> well for all time ranges. >>>> >>>> >>>> Other questions: >>>> >>>> 2) It should. Scott might know better. >>>> >>>> 3) I don't know - again Scott might know this. If I remember correctly >>>> finishLearning is just an optimization step so you can ignore it. Turning >>>> learning off with disableLearning should work for testing. >>>> >>>> 4) Yes, you can run swarming within the VM. The main extra step is that >>>> you need to install MySQL. There is a test script in "python >>>> examples/swarm/test_db.py" to test that the DB is working. If that works >>>> swarming should work. See >>>> https://github.com/numenta/nupic/wiki/Running-Swarms for details. >>>> >>>> This ended up being a really long email! Hopefully it was helpful. >>>> >>>> --Subutai >>>> >>>> >>>> >>>> On Thu, Oct 3, 2013 at 9:13 AM, Pedro Tabacof <[email protected]>wrote: >>>> >>>>> Matt, I haven't uploaded my code anywhere yet. I'd like to try more a >>>>> few more things (which depend on the questions I asked) before I do this >>>>> because I know when I upload the code and post the results here I probably >>>>> won't try to improve or change anything. I only work well under pressure >>>>> lol. >>>>> >>>>> Since I'm gonna be away this weekend, I hope that by the end of next >>>>> week I will set up a github page with everything (explanation of the >>>>> problem, dataset, code and results with competition comparisons). >>>>> >>>>> Pedro. >>>>> >>>>> >>>>> On Thu, Oct 3, 2013 at 12:56 PM, Matthew Taylor <[email protected]>wrote: >>>>> >>>>>> Pedro, this is exciting! Is your code available online anywhere? Any >>>>>> chance you can put it up on github or bitbucket? >>>>>> >>>>>> --------- >>>>>> Matt Taylor >>>>>> OS Community Flag-Bearer >>>>>> Numenta >>>>>> >>>>>> >>>>>> On Thu, Oct 3, 2013 at 6:59 AM, Pedro Tabacof <[email protected]>wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been working with an energy competition dataset [1] and I've >>>>>>> been experimenting with some different ways to predict many steps ahead >>>>>>> (I >>>>>>> have to predict 31 different energy loads for the whole month). This >>>>>>> led me >>>>>>> to some questions: >>>>>>> >>>>>>> 1) Has anyone tried feeding one-step classifier predictions back to >>>>>>> the input? This can be done easily by hand but I'm not sure if this is a >>>>>>> good idea for many steps prediction. >>>>>>> >>>>>>> 2) Does "disableLearning" also turn off classifier learning? If not, >>>>>>> how do I do it? >>>>>>> >>>>>>> 3) Is "finishLearning" deprecated? I tried using it but I got an >>>>>>> error message. >>>>>>> >>>>>>> 4) Is it possible run swarming within the Vagrant VM? What about >>>>>>> Cerebro? >>>>>>> >>>>>>> On a side note, so far I have achieved 3.3% MAPE on the test data, >>>>>>> which would put me among the top 10 competitors (out of 26), with pretty >>>>>>> much the basic NuPIC configuration, very similar to the hotgym example. >>>>>>> >>>>>>> I have experimented with 31-step predictions and 1,2,3,...,31 >>>>>>> predictions, but this was too slow and didn't improve the results. When >>>>>>> I >>>>>>> finish testing all my ideas, I will post my results and experience here. >>>>>>> >>>>>>> Pedro. >>>>>>> >>>>>>> [1] http://neuron.tuke.sk/competition/index.php >>>>>>> -- >>>>>>> Pedro Tabacof, >>>>>>> Unicamp - Eng. de Computação 08. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> nupic mailing list >>>>>>> [email protected] >>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> nupic mailing list >>>>>> [email protected] >>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Pedro Tabacof, >>>>> Unicamp - Eng. de Computação 08. >>>>> >>>>> _______________________________________________ >>>>> nupic mailing list >>>>> [email protected] >>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >>> >>> -- >>> Pedro Tabacof, >>> Unicamp - Eng. de Computação 08. >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > Pedro Tabacof, > Unicamp - Eng. de Computação 08. > -- Pedro Tabacof, Unicamp - Eng. de Computação 08.
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
