I did a quick search for the answers of CAPNP: https://capnproto.org/encoding.html
See PACKING - a technique (compression) CAPNP itself offers (just discards excessive zeros - think sparse vs dense vector); Scott: Are we enabling this? IMHO we should from the start to avoid backwards compat. issues later. COMPRESSION - for repetitive data, which HTM networks typically are, they suggest compression with an external compression tool. Matt, this is something that can/has to be implemented regardless of capnp/pickle. The OPF code responsible for writing the file should afterwards call a compression program; from my experience it helps extremely on current (pickle) data. On Fri, Dec 18, 2015 at 4:22 PM, Matthew Taylor <[email protected]> wrote: > BTW, we have been working on a new serialization format. The old one is > using python's pickle functionality and there are several problems with it. > The new method in NuPIC will be using Capn Proto serialization, which is a > very fast and efficient technique that happens on the C++ side (and through > the pycapnp adapter in python). > > Once we have this finished, the time it takes to save and retrieve models > should decrease by about tenfold (based on Scott's initial experiments). I > assume this will come along with a considerable decrease in serialization > size on disk, but I have not checked. If Scott is reading this, maybe he > can answer. > > > --------- > Matt Taylor > OS Community Flag-Bearer > Numenta > > On Fri, Dec 18, 2015 at 2:46 AM, David Ray <[email protected]> > wrote: > >> >> Hi Karin, >> >> the network can't really grow new connections, which are not yet stored >> in the memory, right? (other than adjusting weights of the connections) >> >> >> The network does in fact grow new connections, Distal Dendrites are >> formed with Synapses housing new connections to other Cells. This is one of >> the most distinguishing features of HTM Neurons as opposed to "point >> neurons" (i.e A-to-Z NNs a.k.a "Deep" Neural Networks). >> >> See: >> https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py#L361 >> >> ...starting above from the "pickCellsToLearnOn()" method... >> >> Cheers, >> David >> >> Sent from my iPhone >> >> On Dec 18, 2015, at 4:10 AM, Karin Valisova <[email protected]> wrote: >> >> Thank you for your answers! >> >> Mathew, what do you mean by, 'how much data the model has seen'? I have >> noticed that the size of network increases with the size of data sample, >> but I can't really see a reason for that - the network can't really grow >> new connections, which are not yet stored in the memory, right? (other >> than adjusting weights of the connections) And if it's a matter of >> accumulation of the data somewhere by the model, for calculation of sliding >> window metrics or thing like these then it can be theoretically cut off - >> if we're talking only about network's ability to process data. >> >> Mark, what kind of compression do you have on your mind? any ideas what >> to try? >> >> Thank you, >> Karin >> >> On Thu, Dec 17, 2015 at 7:29 PM, Marek Otahal <[email protected]> >> wrote: >> >>> Hi Karin, >>> >>> yes, that is an issue! I've suggested to use compression, it helps >>> suprisingly well in this matter (from hundreds of MB to 10s,...) >>> Afaik it's not implemented yet. >>> >>> Cheers, >>> Mark >>> >>> On Thu, Dec 17, 2015 at 6:15 PM, Matthew Taylor <[email protected]> >>> wrote: >>> >>>> That's not too surprising ;). The size of a saved model depends on >>>> several things, including # of input fields, model parameters that >>>> affect how cells connect, and how much data the model has seen. There >>>> are thousands of connections between cells that need to be persisted >>>> when a model is saved. I have seen serialized models be much larger >>>> than 50MB. >>>> --------- >>>> Matt Taylor >>>> OS Community Flag-Bearer >>>> Numenta >>>> >>>> >>>> On Thu, Dec 17, 2015 at 8:06 AM, Karin Valisova <[email protected]> >>>> wrote: >>>> > Hello! >>>> > >>>> > I've been playing around with serialization under opf framework and I >>>> > noticed that when using the typical model for temporal anomaly >>>> detection >>>> > >>>> > >>>> https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py >>>> > >>>> > The size of saved file gets surprisingly large ~ 50 Mb. What is the >>>> reason >>>> > for this? If I understand correctly, only the states of temporal and >>>> spatial >>>> > pooler should be enough to reload a network, right? Or am I >>>> forgetting about >>>> > some extra data stored? >>>> > >>>> > Thank you! >>>> > Karin >>>> >>>> >>> >>> >>> -- >>> Marek Otahal :o) >>> >> >> >> >> -- >> >> datapine GmbH >> Skalitzer Straße 33 >> 10999 Berlin >> >> email: [email protected] >> >> > -- Marek Otahal :o)
