Hello Subutai, Yes, I used a static scalar encoder with min/max values set by hand. I didn't use the same extreme values as the data distribution because there were some weird values - I arbitrarily made the scale a little "tighter" than the data.
I forgot to add on the tips, but by removing summertime data, I could cut the scales approximately by half, so I ended up with more granularity without any increased complexity, which probably helped a lot. Removing data was a great trade-off. I have to admit that I stopped trying to get swarming running too early, before your last email on the other thread. I realized I hadn't tried all strategies, and when I got to the "winning" one, it was getting too late, so I decided to settle at that result, which was better than I had hoped for. I will try swarming on the next project I work on. I'm looking at some Kaggle problems now, within the next two or three weeks I expect to create a thread here looking for possible partners. Pedro. On Wed, Oct 16, 2013 at 4:44 PM, Subutai Ahmad <[email protected]> wrote: > Hi Pedro, > > Thanks for the list of tips - this is very helpful. Adding expected > prediction as an option to the InferenceElement is a really nice idea. > Hopefully we can get this all documented on the wiki for other people. > > Did you ever set the min/max values on the encoder? I wonder if that made > a difference. > > I'm still concerned that you didn't get swarming working. Maybe we can fix > that as well at some point. > > --Subutai > > > On Mon, Oct 14, 2013 at 9:44 AM, Scott Purdy <[email protected]> wrote: > >> Re: #5 - Perhaps you could add that as a method to the InferenceElement >> to get the probability-weighted prediction. >> >> https://github.com/numenta/nupic/blob/master/py/nupic/frameworks/opf/opfutils.py#L40 >> >> >> On Sun, Oct 13, 2013 at 10:49 PM, Pedro Tabacof <[email protected]>wrote: >> >>> Hello, >>> >>> I think this really deserves another thread, so I apologize for the >>> inconvenience of many emails. These are the main lessons I learned with my >>> first successful NuPIC application (see "Electricity forecast competition >>> results" thread for the problem explanation): >>> >>> 1) I only needed to use 390 data samples for the best result. Discarding >>> irrelevant data actually improved my results (since I was trying to predict >>> the energy load of a winter month, summer data had no use to me). >>> >>> 2) The parameters ended up very close to the hotgym example. The only >>> thing I recall changing is "pamLength", but everything else stayed the >>> same. For me this is very motivating because having to fiddle with >>> parameters is the worst part of machine learning (and I couldn't get >>> swarming running). >>> >>> 3) Never do many steps prediction with the same model, use different >>> models for it. For 31 different predictions, using just one model would >>> take a whole day, while using 31 different models would take only an hour. >>> >>> 4) Don't mind if the scaling ends up being too coarse. In my case it was >>> faster and more precise with a coarser scale since I didn't have much >>> training data. >>> >>> 5) Finally, the most important tip, which is something I can't believe >>> hasn't been discussed here: When doing scalar prediction, I found it best >>> to use the prediction expectation, not the highest probability. To do this >>> is really simple: >>> expectation = 0.0 >>> total_probability = 0.0 >>> for i in result.inferences['multiStepPredictions'][k_steps]: >>> expectation += >>> float(i)*float(result.inferences['multiStepPredictions'][k_steps][i]) >>> total_probability += >>> float(result.inferences['multiStepPredictions'][k_steps][i]) >>> expectation = expectation / total_probability >>> >>> This greatly improved my results and I think this should be standard >>> when doing scalar predictions, or at least there should be an option for >>> it. From a statistical point of view, this seems like the most logical >>> choice for doing prediction of scalar values. If there already is an option >>> for this, please pardon my ignorance. >>> >>> Pedro. >>> >>> p.s. I have to thank Subutai Ahmed for most of the tips, his expertise >>> was invaluable to me. >>> >>> -- >>> Pedro Tabacof, >>> Unicamp - Eng. de Computação 08. >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Pedro Tabacof, Unicamp - Eng. de Computação 08.
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
