Thanks for reiterating, this looks promising! > On 13 Mar 2024, at 23:22, Jim Pivarski <jpivar...@gmail.com> wrote: > > So that this doesn't get lost amid the discussion: > https://www.blosc.org/python-blosc2/python-blosc2.html > <https://www.blosc.org/python-blosc2/python-blosc2.html> > > Blosc is on-the-fly compression, which is a more extreme way of making > variable-sized integers. The compression is in small chunks that fit into CPU > cachelines, such that it's random access per chunk. The compression is > lightweight enough that it can be faster to decompress, edit, and recompress > a chunk than it is to copy from RAM, edit, and copy back to RAM. (The extra > cost of compression is paid for by moving less data between RAM and CPU. > That's why I say "can be," because it depends on the entropy of the data.) > Since you have to copy data from RAM to CPU and back anyway, as a part of any > operation on an array, this can be a net win. > > What you're trying to do with variable-length integers is a kind of > compression algorithm, an extremely lightweight one. That's why I think that > Blosc would fit your use-case, because it's doing the same kind of thing, but > with years of development behind it. > > (Earlier, I recommended bcolz, which was a Python array based on Blosc, but > now I see that it has been deprecated. However, the link above goes to the > current version of the Python interface to Blosc, so I'd expect it to cover > the same use-cases.) > > -- Jim > > > > > > On Wed, Mar 13, 2024 at 4:45 PM Dom Grigonis <dom.grigo...@gmail.com > <mailto:dom.grigo...@gmail.com>> wrote: > My array is growing in a manner of: > array[slice] += values > > so for now will just clip values: > res = np.add(array[slice], values, dtype=np.int64) > array[slice] = res > mask = res > MAX_UINT16 > array[slice][mask] = MAX_UINT16 > > For this case, these large values do not have that much impact. And extra > operation overhead is acceptable. > > --- > > And adding more involved project to my TODOs for the future. > > After all, it would be good to have an array, which (at preferably as minimal > cost as possible) could handle anything you throw at it with near-optimal > memory consumption and sensible precision handling, while keeping all the > benefits of numpy. > > Time will tell if that is achievable. If anyone had any good ideas regarding > this I am all ears. > > Much thanks to you all for information and ideas. > dgpb > >> On 13 Mar 2024, at 21:00, Homeier, Derek <dhom...@gwdg.de >> <mailto:dhom...@gwdg.de>> wrote: >> >> On 13 Mar 2024, at 6:01 PM, Dom Grigonis <dom.grigo...@gmail.com >> <mailto:dom.grigo...@gmail.com>> wrote: >>> >>> So my array sizes in this case are 3e8. Thus, 32bit ints would be needed. >>> So it is not a solution for this case. >>> >>> Nevertheless, such concept would still be worthwhile for cases where >>> integers are say max 256bits (or unlimited), then even if memory addresses >>> or offsets are 64bit. This would both: >>> a) save memory if many of values in array are much smaller than 256bits >>> b) provide a standard for dynamically unlimited size values >> >> In principle one could encode individual offsets in a smarter way, using >> just the minimal number of bits required, >> but again that would make random access impossible or very expensive – >> probably more or less amounting to >> what smart compression algorithms are already doing. >> Another approach might be to to use the mask approach after all (or just >> flag all you uint8 data valued 2**8 as >> overflows) and store the correct (uint64 or whatever) values and their >> indices in a second array. >> May still not vectorise very efficiently with just numpy if your typical >> operations are non-local. >> >> Derek >> >> _______________________________________________ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> <mailto:numpy-discussion@python.org> >> To unsubscribe send an email to numpy-discussion-le...@python.org >> <mailto:numpy-discussion-le...@python.org> >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >> Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com> > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > <mailto:numpy-discussion@python.org> > To unsubscribe send an email to numpy-discussion-le...@python.org > <mailto:numpy-discussion-le...@python.org> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> > Member address: jpivar...@gmail.com <mailto:jpivar...@gmail.com> > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: dom.grigo...@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com