Also, the suggestions to compress the existing serialized state should help a lot!
On Fri, Dec 18, 2015 at 10:32 AM, Scott Purdy <[email protected]> wrote: > Great questions. > > - We consider HTM models to be "fixed resources" that use a constant > amount of memory. However, we may not allocate all of this memory right up > front. Since many problems will never use all of the possible segments, we > allocate segments as they are created. > > - If you are using the new Python implementation of Temporal Memory, the > number of segments is not limited. This is something we want to allow but > should definitely not be the default. See > https://github.com/numenta/nupic/issues/1588 > > - Yes, the Cap'n Proto serialization will be MUCH smaller. I doubt there > will be much decrease in size possible with any technique once we switch > over. > > On Fri, Dec 18, 2015 at 8:03 AM, Marek Otahal <[email protected]> > wrote: > >> I did a quick search for the answers of CAPNP: >> https://capnproto.org/encoding.html >> >> See PACKING - a technique (compression) CAPNP itself offers (just >> discards excessive zeros - think sparse vs dense vector); Scott: Are we >> enabling this? IMHO we should from the start to avoid backwards compat. >> issues later. >> >> COMPRESSION - for repetitive data, which HTM networks typically are, they >> suggest compression with an external compression tool. Matt, this is >> something that can/has to be implemented regardless of capnp/pickle. The >> OPF code responsible for writing the file should afterwards call a >> compression program; from my experience it helps extremely on current >> (pickle) data. >> >> >> On Fri, Dec 18, 2015 at 4:22 PM, Matthew Taylor <[email protected]> wrote: >> >>> BTW, we have been working on a new serialization format. The old one is >>> using python's pickle functionality and there are several problems with it. >>> The new method in NuPIC will be using Capn Proto serialization, which is a >>> very fast and efficient technique that happens on the C++ side (and through >>> the pycapnp adapter in python). >>> >>> Once we have this finished, the time it takes to save and retrieve >>> models should decrease by about tenfold (based on Scott's initial >>> experiments). I assume this will come along with a considerable decrease in >>> serialization size on disk, but I have not checked. If Scott is reading >>> this, maybe he can answer. >>> >>> >>> --------- >>> Matt Taylor >>> OS Community Flag-Bearer >>> Numenta >>> >>> On Fri, Dec 18, 2015 at 2:46 AM, David Ray <[email protected]> >>> wrote: >>> >>>> >>>> Hi Karin, >>>> >>>> the network can't really grow new connections, which are not yet stored >>>> in the memory, right? (other than adjusting weights of the connections) >>>> >>>> >>>> The network does in fact grow new connections, Distal Dendrites are >>>> formed with Synapses housing new connections to other Cells. This is one of >>>> the most distinguishing features of HTM Neurons as opposed to "point >>>> neurons" (i.e A-to-Z NNs a.k.a "Deep" Neural Networks). >>>> >>>> See: >>>> https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py#L361 >>>> >>>> ...starting above from the "pickCellsToLearnOn()" method... >>>> >>>> Cheers, >>>> David >>>> >>>> Sent from my iPhone >>>> >>>> On Dec 18, 2015, at 4:10 AM, Karin Valisova <[email protected]> wrote: >>>> >>>> Thank you for your answers! >>>> >>>> Mathew, what do you mean by, 'how much data the model has seen'? I >>>> have noticed that the size of network increases with the size of data >>>> sample, but I can't really see a reason for that - the network can't really >>>> grow new connections, which are not yet stored in the memory, right? (other >>>> than adjusting weights of the connections) And if it's a matter of >>>> accumulation of the data somewhere by the model, for calculation of sliding >>>> window metrics or thing like these then it can be theoretically cut off - >>>> if we're talking only about network's ability to process data. >>>> >>>> Mark, what kind of compression do you have on your mind? any ideas what >>>> to try? >>>> >>>> Thank you, >>>> Karin >>>> >>>> On Thu, Dec 17, 2015 at 7:29 PM, Marek Otahal <[email protected]> >>>> wrote: >>>> >>>>> Hi Karin, >>>>> >>>>> yes, that is an issue! I've suggested to use compression, it helps >>>>> suprisingly well in this matter (from hundreds of MB to 10s,...) >>>>> Afaik it's not implemented yet. >>>>> >>>>> Cheers, >>>>> Mark >>>>> >>>>> On Thu, Dec 17, 2015 at 6:15 PM, Matthew Taylor <[email protected]> >>>>> wrote: >>>>> >>>>>> That's not too surprising ;). The size of a saved model depends on >>>>>> several things, including # of input fields, model parameters that >>>>>> affect how cells connect, and how much data the model has seen. There >>>>>> are thousands of connections between cells that need to be persisted >>>>>> when a model is saved. I have seen serialized models be much larger >>>>>> than 50MB. >>>>>> --------- >>>>>> Matt Taylor >>>>>> OS Community Flag-Bearer >>>>>> Numenta >>>>>> >>>>>> >>>>>> On Thu, Dec 17, 2015 at 8:06 AM, Karin Valisova <[email protected]> >>>>>> wrote: >>>>>> > Hello! >>>>>> > >>>>>> > I've been playing around with serialization under opf framework and >>>>>> I >>>>>> > noticed that when using the typical model for temporal anomaly >>>>>> detection >>>>>> > >>>>>> > >>>>>> https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py >>>>>> > >>>>>> > The size of saved file gets surprisingly large ~ 50 Mb. What is the >>>>>> reason >>>>>> > for this? If I understand correctly, only the states of temporal >>>>>> and spatial >>>>>> > pooler should be enough to reload a network, right? Or am I >>>>>> forgetting about >>>>>> > some extra data stored? >>>>>> > >>>>>> > Thank you! >>>>>> > Karin >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Marek Otahal :o) >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> datapine GmbH >>>> Skalitzer Straße 33 >>>> 10999 Berlin >>>> >>>> email: [email protected] >>>> >>>> >>> >> >> >> -- >> Marek Otahal :o) >> > >
