Thanks for sharing that, Daniel. I can only go off your graphic (it'd be great if you could github the actual data along with the description.py), but it looks like you're showing 5 days of hourly data there. A couple of points:
1. Predictions 1-10 hours ahead are the kind of thing you would expect NuPIC to be able to learn from this kind of data. 100 and certainly 1000 hours ahead are likely simply not to work at all. 2. It's not clear how you're representing the time of day here (as Scott says) - this should ideally be represented using a slightly overlapping scalar encoder with plenty of bits. Time of day is hugely important in gross web traffic data at an hourly scale (and not just for NuPIC). 3. Your daily cycles are quite noisy - there seems to be a missing daily peak in the middle of the raw trace (a peak which the 1-step trace seems to attempt to predict), not sure if you've included a field for business day vs weekend/holiday in your input data. 4. The peaks are unevenly spaced and seem to oscillate between steep up-gradual down and the opposite. There could be some underlying cause for this but it may be that the data is too aggregated for NuPIC to spot and learn the temporal patterns. 5. The one step plot is not far away from the raw data (except for the missing peak), either that or you aren't plotting the lag and it's just following the last datum. 6. Are these visits/visitors or pageviews? Depending on the site, pageview data for small numbers of visitors can be very lumpy, whereas visitor numbers in the hundreds per hour should be easier to model. If you can, please share the raw data and setup - it'd be very interesting to see how we can crowdsource a better set of predictions. I'll go hunting for some old logs myself and see if I can come up with a HOWTO for running NuPIC on web data. Thanks again for your query, this is the kind of thing that many people would be interested in testing NuPIC on. Regards, Fergal Byrne On Tue, May 13, 2014 at 10:01 PM, Scott Purdy <[email protected]> wrote: > You will likely need to run a swarm. In particular, you should give it the > option to include a time-of-day field and make sure it can select a very > fine encoding for that field. Also the standard permute options would be a > good idea. > > > On Tue, May 13, 2014 at 12:22 PM, Matthew Taylor <[email protected]> wrote: > >> Can you paste the model params you are using? Looking at the chart, my >> guess is that it's not being processed as temporal data. Are you using >> a timestamp? Seeing your input CSV for the swarm (or at least a part >> of it) would be helpful, too. >> --------- >> Matt Taylor >> OS Community Flag-Bearer >> Numenta >> >> >> On Tue, May 13, 2014 at 12:13 PM, Daniel Cohen <[email protected]> >> wrote: >> > This is a data set taken from google analytics - it is almost 2 years >> of web >> > traffic data by date hour - that's about 4000 data points. >> > >> > I ran through the same process I used for the sine wave prediction >> tutorial >> > except I added more prediction steps. The attached .png is a zoom in of >> the >> > plot. Even the 1 step prediction is disappointing and I'd expect better >> > after 4000 data points. The 10 step prediction is almost wholly >> unreliable >> > and anything beyond that is useless. Sure the predictions are within the >> > right range of points but you couldn't base anything useful off them. >> They >> > don't even seem to have picked up on the hourly modulation throughout a >> day. >> > >> > Is there any way to improve the prediction quality without simply using >> more >> > data points? >> > >> > _______________________________________________ >> > nupic mailing list >> > [email protected] >> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> > >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> > > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Fergal Byrne, Brenter IT Author, Real Machine Intelligence with Clortex and NuPIC https://leanpub.com/realsmartmachines Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014: http://euroclojure.com/2014/ and at LambdaJam Chicago, July 2014: http://www.lambdajam.com http://inbits.com - Better Living through Thoughtful Technology http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne e:[email protected] t:+353 83 4214179 Join the quest for Machine Intelligence at http://numenta.org Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
