Re: Serialization in opf - file size issue

Marek Otahal Fri, 18 Dec 2015 08:04:29 -0800

I did a quick search for the answers of CAPNP:
https://capnproto.org/encoding.html


See PACKING - a technique (compression) CAPNP itself offers (just discards
excessive zeros - think sparse vs dense vector); Scott: Are we enabling
this? IMHO we should from the start to avoid backwards compat. issues
later.

COMPRESSION - for repetitive data, which HTM networks typically are, they
suggest compression with an external compression tool. Matt, this is
something that can/has to be implemented regardless of capnp/pickle. The
OPF code responsible for writing the file should afterwards call a
compression program; from my experience it helps extremely on current
(pickle) data.


On Fri, Dec 18, 2015 at 4:22 PM, Matthew Taylor <[email protected]> wrote:

> BTW, we have been working on a new serialization format. The old one is
> using python's pickle functionality and there are several problems with it.
> The new method in NuPIC will be using Capn Proto serialization, which is a
> very fast and efficient technique that happens on the C++ side (and through
> the pycapnp adapter in python).
>
> Once we have this finished, the time it takes to save and retrieve models
> should decrease by about tenfold (based on Scott's initial experiments). I
> assume this will come along with a considerable decrease in serialization
> size on disk, but I have not checked. If Scott is reading this, maybe he
> can answer.
>
>
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
> On Fri, Dec 18, 2015 at 2:46 AM, David Ray <[email protected]>
> wrote:
>
>>
>> Hi Karin,
>>
>> the network can't really grow new connections, which are not yet stored
>> in the memory, right? (other than adjusting weights of the connections)
>>
>>
>> The network does in fact grow new connections,  Distal Dendrites are
>> formed with Synapses housing new connections to other Cells. This is one of
>> the most distinguishing features of HTM Neurons as opposed to "point
>> neurons" (i.e A-to-Z NNs a.k.a "Deep" Neural Networks).
>>
>> See:
>> https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py#L361
>>
>> ...starting above from the "pickCellsToLearnOn()" method...
>>
>> Cheers,
>> David
>>
>> Sent from my iPhone
>>
>> On Dec 18, 2015, at 4:10 AM, Karin Valisova <[email protected]> wrote:
>>
>> Thank you for your answers!
>>
>> Mathew, what do you mean by, 'how much data the model has seen'? I have
>> noticed that the size of network increases with the size of data sample,
>> but I can't really see a reason for that - the network can't really grow
>> new connections, which are not yet stored in the memory, right? (other
>> than adjusting weights of the connections) And if it's a matter of
>> accumulation of the data somewhere by the model, for calculation of sliding
>> window metrics or thing like these then it can be theoretically cut off -
>> if we're talking only about network's ability to process data.
>>
>> Mark, what kind of compression do you have on your mind? any ideas what
>> to try?
>>
>> Thank you,
>> Karin
>>
>> On Thu, Dec 17, 2015 at 7:29 PM, Marek Otahal <[email protected]>
>> wrote:
>>
>>> Hi Karin,
>>>
>>> yes, that is an issue! I've suggested to use compression, it helps
>>> suprisingly well in this matter (from hundreds of MB to 10s,...)
>>> Afaik it's not implemented yet.
>>>
>>> Cheers,
>>> Mark
>>>
>>> On Thu, Dec 17, 2015 at 6:15 PM, Matthew Taylor <[email protected]>
>>> wrote:
>>>
>>>> That's not too surprising ;). The size of a saved model depends on
>>>> several things, including # of input fields, model parameters that
>>>> affect how cells connect, and how much data the model has seen. There
>>>> are thousands of connections between cells that need to be persisted
>>>> when a model is saved. I have seen serialized models be much larger
>>>> than 50MB.
>>>> ---------
>>>> Matt Taylor
>>>> OS Community Flag-Bearer
>>>> Numenta
>>>>
>>>>
>>>> On Thu, Dec 17, 2015 at 8:06 AM, Karin Valisova <[email protected]>
>>>> wrote:
>>>> > Hello!
>>>> >
>>>> > I've been playing around with serialization under opf framework and I
>>>> > noticed that when using the typical model for temporal anomaly
>>>> detection
>>>> >
>>>> >
>>>> https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py
>>>> >
>>>> > The size of saved file gets surprisingly large ~ 50 Mb. What is the
>>>> reason
>>>> > for this? If I understand correctly, only the states of temporal and
>>>> spatial
>>>> > pooler should be enough to reload a network, right? Or am I
>>>> forgetting about
>>>> > some extra data stored?
>>>> >
>>>> > Thank you!
>>>> > Karin
>>>>
>>>>
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>
>>
>>
>> --
>>
>> datapine GmbH
>> Skalitzer Straße 33
>> 10999 Berlin
>>
>> email: [email protected]
>>
>>
>


-- 
Marek Otahal :o)

Re: Serialization in opf - file size issue

Reply via email to