Daniel, It's hard to say without seeing the data. Perhaps by domain instead of URL? For a given URL, how many requests per second are we talking about?
Wakan, You can't push too much data into a model, but you might be expecting it to understand more than it actually can. You might only define a few fields of data to a model, but the data might have a very high cardinality. One model will have a hard time learning all the patterns of all the input in high-cardinality. >From my experience, it is best to attempt to break input data with high cardinality into multiple streams of data that represent individual data streams. In this example, Daniel seems to have data with very high cardinality. There are many website users browsing the web, and there are many websites with specific domains serving pages to all those users. Expecting one tiny section of simulated cortex to understand it might be asking too much. Everyone, Keep in mind that the models we create with NuPIC or HTM.Java are very small, non-hierarchical regions of cortex. They are the foundational building blocks of HTM theory, but they cannot handle lots of input fields or data with high cardinality. My tactic to work around this has been to split the data and build out more models. Sometimes that works great, but other times I realize I don't have the CPU power or memory to run 1000 models like I need to really understand the problem. If you run into the same problem, you should look into scalability projects like HTM Engine or HTM-Moclu. - https://github.com/numenta/numenta-apps/tree/master/htmengine - https://github.com/antidata/htm-moclu Regards, --------- Matt Taylor OS Community Flag-Bearer Numenta On Sun, Feb 14, 2016 at 1:40 PM, Daniel Rice <[email protected]> wrote: > We've tried breaking it up by URL and having one model per URL already, but > there's still too much data to get through quickly. Do you have any other > suggestions for breaking it up so the models can run faster? > > Thanks, > Daniel Rice
