Hello Subutai,

Yes, I used a static scalar encoder with min/max values set by hand. I
didn't use the same extreme values as the data distribution because there
were some weird values - I arbitrarily made the scale a little "tighter"
than the data.

I forgot to add on the tips, but by removing summertime data, I could cut
the scales approximately by half, so I ended up with more granularity
without any increased complexity, which probably helped a lot. Removing
data was a great trade-off.

I have to admit that I stopped trying to get swarming running too early,
before your last email on the other thread. I realized I hadn't tried all
strategies, and when I got to the "winning" one, it was getting too late,
so I decided to settle at that result, which was better than I had hoped
for.

I will try swarming on the next project I work on. I'm looking at some
Kaggle problems now, within the next two or three weeks I expect to create
a thread here looking for possible partners.

Pedro.


On Wed, Oct 16, 2013 at 4:44 PM, Subutai Ahmad <[email protected]> wrote:

> Hi Pedro,
>
> Thanks for the list of tips - this is very helpful. Adding expected
> prediction as an option to the InferenceElement is a really nice idea.
> Hopefully we can get this all documented on the wiki for other people.
>
> Did you ever set the min/max values on the encoder? I wonder if that made
> a difference.
>
> I'm still concerned that you didn't get swarming working. Maybe we can fix
> that as well at some point.
>
> --Subutai
>
>
> On Mon, Oct 14, 2013 at 9:44 AM, Scott Purdy <[email protected]> wrote:
>
>> Re: #5 - Perhaps you could add that as a method to the InferenceElement
>> to get the probability-weighted prediction.
>>
>> https://github.com/numenta/nupic/blob/master/py/nupic/frameworks/opf/opfutils.py#L40
>>
>>
>> On Sun, Oct 13, 2013 at 10:49 PM, Pedro Tabacof <[email protected]>wrote:
>>
>>> Hello,
>>>
>>> I think this really deserves another thread, so I apologize for the
>>> inconvenience of many emails. These are the main lessons I learned with my
>>> first successful NuPIC application (see "Electricity forecast competition
>>> results" thread for the problem explanation):
>>>
>>> 1) I only needed to use 390 data samples for the best result. Discarding
>>> irrelevant data actually improved my results (since I was trying to predict
>>> the energy load of a winter month, summer data had no use to me).
>>>
>>> 2) The parameters ended up very close to the hotgym example. The only
>>> thing I recall changing is "pamLength", but everything else stayed the
>>> same. For me this is very motivating because having to fiddle with
>>> parameters is the worst part of machine learning (and I couldn't get
>>> swarming running).
>>>
>>> 3) Never do many steps prediction with the same model, use different
>>> models for it. For 31 different predictions, using just one model would
>>> take a whole day, while using 31 different models would take only an hour.
>>>
>>> 4) Don't mind if the scaling ends up being too coarse. In my case it was
>>> faster and more precise with a coarser scale since I didn't have much
>>> training data.
>>>
>>> 5) Finally, the most important tip, which is something I can't believe
>>> hasn't been discussed here: When doing scalar prediction, I found it best
>>> to use the prediction expectation, not the highest probability. To do this
>>> is really simple:
>>>     expectation = 0.0
>>>     total_probability = 0.0
>>>     for i in result.inferences['multiStepPredictions'][k_steps]:
>>>         expectation +=
>>> float(i)*float(result.inferences['multiStepPredictions'][k_steps][i])
>>>         total_probability +=
>>> float(result.inferences['multiStepPredictions'][k_steps][i])
>>>     expectation = expectation / total_probability
>>>
>>> This greatly improved my results and I think this should be standard
>>> when doing scalar predictions, or at least there should be an option for
>>> it. From a statistical point of view, this seems like the most logical
>>> choice for doing prediction of scalar values. If there already is an option
>>> for this, please pardon my ignorance.
>>>
>>> Pedro.
>>>
>>> p.s. I have to thank Subutai Ahmed for most of the tips, his expertise
>>> was invaluable to me.
>>>
>>> --
>>> Pedro Tabacof,
>>> Unicamp - Eng. de Computação 08.
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 
Pedro Tabacof,
Unicamp - Eng. de Computação 08.
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to