Daniel,

It's hard to say without seeing the data. Perhaps by domain instead of
URL? For a given URL, how many requests per second are we talking
about?

Wakan,

You can't push too much data into a model, but you might be expecting
it to understand more than it actually can. You might only define a
few fields of data to a model, but the data might have a very high
cardinality. One model will have a hard time learning all the patterns
of all the input in high-cardinality.

>From my experience, it is best to attempt to break input data with
high cardinality into multiple streams of data that represent
individual data streams. In this example, Daniel seems to have data
with very high cardinality. There are many website users browsing the
web, and there are many websites with specific domains serving pages
to all those users. Expecting one tiny section of simulated cortex to
understand it might be asking too much.

Everyone,

Keep in mind that the models we create with NuPIC or HTM.Java are very
small, non-hierarchical regions of cortex. They are the foundational
building blocks of HTM theory, but they cannot handle lots of input
fields or data with high cardinality. My tactic to work around this
has been to split the data and build out more models. Sometimes that
works great, but other times I realize I don't have the CPU power or
memory to run 1000 models like I need to really understand the
problem. If you run into the same problem, you should look into
scalability projects like HTM Engine or HTM-Moclu.

- https://github.com/numenta/numenta-apps/tree/master/htmengine
- https://github.com/antidata/htm-moclu

Regards,
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Sun, Feb 14, 2016 at 1:40 PM, Daniel Rice <[email protected]> wrote:
> We've tried breaking it up by URL and having one model per URL already, but
> there's still too much data to get through quickly. Do you have any other
> suggestions for breaking it up so the models can run faster?
>
> Thanks,
> Daniel Rice

Reply via email to