Hi Daniel,

It is often tough to know how to improve results after the initial run!   I
looked through your files and saw a bunch of issues as listed below.

Before the fixes, for 10-step ahead prediction with your original files I
got a MAPE error rate of 40%. After the changes I got much better error
rates, around 18% for 10-step ahead prediction.  For 1-step prediction I
got an error rate of 13.1% with the fixes.

I don’t know if this represents optimal results. There may also be
additional things you can do to improve accuracy, but I hope this is at
least a step in the right direction.

As a community it would be good to build up knowledge along these lines,
and the common mistakes/pitfalls people make.  Debugging accuracy is quite
different from debugging code!

—Subutai

The updated JSON file is here: http://pastebin.com/wnBMFVKs

The reformatted CSV file is here:
https://drive.google.com/file/d/0B9oJdZFdnTAKWG1PSm5mMHNULVE/edit?usp=sharing

Here are the issues I saw/fixed:

1) Dates are in decreasing order - typically time should always be
increasing for NuPIC.

2) Date format incorrect - was being parsed as a float by NuPIC.

3) The included fields attribute did not actually include date.

4) Min/max values were not specified for visits. Performance is usually a
bit better with the min/max specified.

5) For best accuracy, it is usually best to swarm for each prediction step
individually as we have found that the parameters are slightly different
for different prediction windows.  If you specify both 1 and 10, I believe
the swarm will optimize for 1. In my experiments I did separate swarms for
1 step and 10 step prediction.



On Mon, May 26, 2014 at 5:13 AM, Daniel Cohen <[email protected]> wrote:

> I asked a while ago - when I responded to further questions I didn't get a
> further response, so here I am asking again.
>
> This is a data set taken from google analytics - it is almost 2 years of
> web traffic data by date hour - that's about 4000 data points.
>
>
>
> I ran through the same process I used for the sine wave prediction tutorial
> except I added more prediction steps. The attached .png is an extract of
> the
> plot. Even the 1 step prediction is disappointing and I'd expect better
> after 4000 data points. The 10 step prediction is almost wholly unreliable
> and anything beyond that is useless. Sure the predictions are within the
> right range of points but you couldn't base anything useful off them. They
> don't even seem to have picked up on the hourly modulation throughout a
> day.
>
> Is there any way to improve the prediction quality without simply using
> more data points?
>
> search_def.json <http://justpaste.it/fhwk>
>
> visits.csv<
> https://drive.google.com/file/d/0B8imHXOv0rGcSXVmREV6QTVlTjA/edit?usp=sharing
> >
>
> The visits are by datehour to increase the number of data points. I wasn't
> expecting great things from the 1000 step prediction but I did want to test
> it. The 10 step and 100 step I did expect much more from though.
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to