On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote: > > > > OK, more below. But unfortunately `int2` and `int4` *are* > > problematic, > > because the NumPy array uses a byte-sized strided layout, so you > > would > > have to store them in a full byte, which is probably not what you > > want. > > > > I am always thinking of adding a provision for it in the DTypes so > > that > > someone could use part of the NumPy machine to make an array that > > can > > have non-byte sized strides, but the NumPy array itself is ABI > > incompatible with storing these packed :(. > > > > (I.e. we could plug that "hole" to allow making an int4 DType in > NumPy, > > but it would still have to take 1-byte storage space when put into > > a > > NumPy array, so I am not sure there is much of a point.) > > > > > I have also been curious about the new DTypes mechanism and whether > we > could do non byte-size DTypes with it. One use-case I have > specifically is > for reading and writing non byte-aligned data [1]. So, this would > work very > well for that use-case if the dtype knew how to read/write the > proper bit-size. For my use-case I wouldn't care too much if > internally > Numpy needs to expand and store the data as full bytes, but being > able to > read a bitwise binary stream into Numpy native dtypes for further > processing would be useful I think (without having to resort to > unpackbits > and do rearranging/packing to other types). > > dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', > 'uint5')} > # x would have two unsigned ints, but reading only one byte from the > stream > x = np.frombuffer(buffer, dtype) > # would be ideal to get tobytes() to know how to pack a uint3+uint5 > DType > into a single byte as well > x.tobytes()
Unfortunately, I suspect the amount of expectations users would have from a full DType, and the fact that bit-sized will be a bit awkward in NumPy arrays for the forseeable future makes me think dedicated conversion functions are probably more practical. Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you could view the same array also with `MyInt(bits=3, offset=0)`. (Maybe also structured DType, but I am not certain that is advisable and custom structured DTypes would require holes to be plucked). A custom dtype that is "structured" might work (i.e. you could store two numbers in one byte of course). Currently you cannot integrate deep enough into NumPy to build structured dtypes based on arbitrary other dtypes, but you could do it for your own bit DType. (I am not quite sure you can make `arr["count0"]` work, this is a hole that needs plucking.) This is probably not a small task though. Could `tobytes()` be made to compactify? Yes, but then it suddenly needs extra logic for bit-sized and doesn't just expose memory. That is maybe fine, but also seems a bit awkward? I would love to have a better answer, but dancing around the byte- strided ABI seems tricky... Anyway, I am always available to discuss such possibilities, there are some corners w.r.t. to such bit-sized thoughts which are still shrouded in fog. - Sebastian > > Greg > > [1] Specifically, this is for very low bandwidth satellite data where > we > try to pack as much information in the downlink and use every bit of > space, > but once on the ground I can expand the bit-size fields to byte-size > fields > without too much issue of worrying about space [puns intended]. > > > On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote: > > > Thanks! That does indeed look like a promising approach! And for > > > sure > > > it > > > would be better to avoid having to reimplement the whole array- > > > part > > > and > > > only focus on the data types. (If successful, my idea of a > > > project > > > would > > > basically solve all the custom numerical types discussed, > > > bfloat16, > > > int2, > > > int4 etc.) > > > > OK, more below. But unfortunately `int2` and `int4` *are* > > problematic, > > because the NumPy array uses a byte-sized strided layout, so you > > would > > have to store them in a full byte, which is probably not what you > > want. > > > > I am always thinking of adding a provision for it in the DTypes so > > that > > someone could use part of the NumPy machine to make an array that > > can > > have non-byte sized strides, but the NumPy array itself is ABI > > incompatible with storing these packed :(. > > > > (I.e. we could plug that "hole" to allow making an int4 DType in > > NumPy, > > but it would still have to take 1-byte storage space when put into > > a > > NumPy array, so I am not sure there is much of a point.) > > > > > > > > I understand that the following is probably a hard question to > > > answer, but > > > is it expected that there will be work done on this in the "near" > > > future > > > to fill any holes and possibly become more stable? For context, > > > the > > > current > > > plan on my side is to propose this as a student project for the > > > spring, so > > > primarily asking for planning and describing the project a bit > > > better. > > > > > > Well, it depends on what you need. With the exception above, I > > doubt > > the "holes" will matter much practice unless you are targeting for > > a > > polished release rather than experimentation. > > But of course it may be that you run into something that is > > important > > for you, but doesn't yet quite work. > > > > I will note just dealing with the Python/NumPy C-API can be a > > fairly > > steep learning curve, so you need someone comfortable to dive in > > and > > budget a good amount of time for that part. > > And yes, this is pretty new, so there may be stumbling stones > > (which I > > am happy to discuss in NumPy issues or directly). > > > > - Sebastian > > > > > > > > > > BR Oscar > > > > > > Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg < > > > sebast...@sipsolutions.net>: > > > > > > > On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote: > > > > > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg < > > > > > sebast...@sipsolutions.net>: > > > > > > > > > > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote: > > > > > > > > > > > > > > > > I'm not an expert, but I never encountered rounding > > > > > > > > floating > > > > > > > > point > > > > > > > > numbers > > > > > > > > in bases different from 2 and 10. > > > > > > > > > > > > > > > > > > > > > > I agree that this is probably not very common. More a > > > > > > > possibility > > > > > > > if > > > > > > > one > > > > > > > would supply a base argument to around. > > > > > > > > > > > > > > However, it is worth noting that Matlab has the quant > > > > > > > function, > > > > > > > https://www.mathworks.com/help/deeplearning/ref/quant.html > > > > > > > wh > > > > > > > ich > > > > > > > basically > > > > > > > supports arbitrary bases (as a special case of an even > > > > > > > more > > > > > > > general > > > > > > > approach). So there may be other use cases (although the > > > > > > > example > > > > > > > basically > > > > > > > just implements around(x, 1)). > > > > > > > > > > > > > > > > > > To be honest, hearing hardware design and data compression > > > > > > does > > > > > > make me > > > > > > lean towards it not being mainstream enough that inclusion > > > > > > in > > > > > > NumPy > > > > > > really makes sense. But happy to hear opposing opinions. > > > > > > > > > > > > > > > > Here I can easily argue that "all" computations are limited > > > > > by > > > > > finite > > > > > word > > > > > length and as soon as you want to see the effect of any type > > > > > of > > > > > format not > > > > > supported out of the box, it will be beneficial. (Strictly, > > > > > it > > > > > makes > > > > > more > > > > > sense to quantize to a given number of bits than a given > > > > > number > > > > > of > > > > > decimal > > > > > digits, as we cannot represent most of those exactly.) But I > > > > > may > > > > > not > > > > > do > > > > > that. > > > > > > > > > > > > > > > > It would be nice to have more of a culture around ufuncs > > > > > > that > > > > > > do > > > > > > not > > > > > > live in NumPy. (I suppose at some point it was more > > > > > > difficult > > > > > > to > > > > > > do C- > > > > > > extension, but that is many years ago). > > > > > > > > > > > > > > > > I do agree with this though. And this got me realizing that > > > > > maybe > > > > > what I > > > > > actually would like to do is to create an array-library with > > > > > fully > > > > > customizable (numeric) data types instead. That is, sort of, > > > > > the > > > > > proper way > > > > > to do it, although the proposed approach is indeed simpler > > > > > and in > > > > > most > > > > > cases will work well enough. > > > > > > > > > > (Am I right in believing that it is not that easy to piggy- > > > > > back > > > > > custom data > > > > > types onto NumPy arrays? Something different from using > > > > > object as > > > > > dtype or > > > > > the "struct-like" custom approach using the existing scalar > > > > > types.) > > > > > > > > NumPy is pretty much fully customizeable (beyond just numeric > > > > data > > > > types). > > > > Admittedly, to not have weird edge cases and have more power > > > > you > > > > have > > > > to use the new API (NEP 41-43 [1]) and that is "experimental" > > > > and > > > > may > > > > have some holes. > > > > "Experimental" doesn't mean it is expected to change > > > > significantly, > > > > just that you can't ship your stuff broadly really. > > > > > > > > The holes may matter for some complicated dtypes (custom memory > > > > allocation, parametric...). But at this point many should be > > > > rather > > > > fixable, so before you do your own give NumPy a chance? > > > > > > > > - Sebastian > > > > > > > > > > > > [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html > > > > > > > > > > > > > > BR Oscar Gustafsson > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > > To unsubscribe send an email to > > > > > numpy-discussion-le...@python.org > > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: oscar.gustafs...@gmail.com > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sebast...@sipsolutions.net > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: greg.m.lu...@gmail.com > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com