> > OK, more below. But unfortunately `int2` and `int4` *are* problematic, > because the NumPy array uses a byte-sized strided layout, so you would > have to store them in a full byte, which is probably not what you want.
> I am always thinking of adding a provision for it in the DTypes so that > someone could use part of the NumPy machine to make an array that can > have non-byte sized strides, but the NumPy array itself is ABI > incompatible with storing these packed :(. (I.e. we could plug that "hole" to allow making an int4 DType in NumPy, > but it would still have to take 1-byte storage space when put into a > NumPy array, so I am not sure there is much of a point.) I have also been curious about the new DTypes mechanism and whether we could do non byte-size DTypes with it. One use-case I have specifically is for reading and writing non byte-aligned data [1]. So, this would work very well for that use-case if the dtype knew how to read/write the proper bit-size. For my use-case I wouldn't care too much if internally Numpy needs to expand and store the data as full bytes, but being able to read a bitwise binary stream into Numpy native dtypes for further processing would be useful I think (without having to resort to unpackbits and do rearranging/packing to other types). dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', 'uint5')} # x would have two unsigned ints, but reading only one byte from the stream x = np.frombuffer(buffer, dtype) # would be ideal to get tobytes() to know how to pack a uint3+uint5 DType into a single byte as well x.tobytes() Greg [1] Specifically, this is for very low bandwidth satellite data where we try to pack as much information in the downlink and use every bit of space, but once on the ground I can expand the bit-size fields to byte-size fields without too much issue of worrying about space [puns intended]. On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg <sebast...@sipsolutions.net> wrote: > On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote: > > Thanks! That does indeed look like a promising approach! And for sure > > it > > would be better to avoid having to reimplement the whole array-part > > and > > only focus on the data types. (If successful, my idea of a project > > would > > basically solve all the custom numerical types discussed, bfloat16, > > int2, > > int4 etc.) > > OK, more below. But unfortunately `int2` and `int4` *are* problematic, > because the NumPy array uses a byte-sized strided layout, so you would > have to store them in a full byte, which is probably not what you want. > > I am always thinking of adding a provision for it in the DTypes so that > someone could use part of the NumPy machine to make an array that can > have non-byte sized strides, but the NumPy array itself is ABI > incompatible with storing these packed :(. > > (I.e. we could plug that "hole" to allow making an int4 DType in NumPy, > but it would still have to take 1-byte storage space when put into a > NumPy array, so I am not sure there is much of a point.) > > > > > I understand that the following is probably a hard question to > > answer, but > > is it expected that there will be work done on this in the "near" > > future > > to fill any holes and possibly become more stable? For context, the > > current > > plan on my side is to propose this as a student project for the > > spring, so > > primarily asking for planning and describing the project a bit > > better. > > > Well, it depends on what you need. With the exception above, I doubt > the "holes" will matter much practice unless you are targeting for a > polished release rather than experimentation. > But of course it may be that you run into something that is important > for you, but doesn't yet quite work. > > I will note just dealing with the Python/NumPy C-API can be a fairly > steep learning curve, so you need someone comfortable to dive in and > budget a good amount of time for that part. > And yes, this is pretty new, so there may be stumbling stones (which I > am happy to discuss in NumPy issues or directly). > > - Sebastian > > > > > > BR Oscar > > > > Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg < > > sebast...@sipsolutions.net>: > > > > > On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote: > > > > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg < > > > > sebast...@sipsolutions.net>: > > > > > > > > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote: > > > > > > > > > > > > > > I'm not an expert, but I never encountered rounding > > > > > > > floating > > > > > > > point > > > > > > > numbers > > > > > > > in bases different from 2 and 10. > > > > > > > > > > > > > > > > > > > I agree that this is probably not very common. More a > > > > > > possibility > > > > > > if > > > > > > one > > > > > > would supply a base argument to around. > > > > > > > > > > > > However, it is worth noting that Matlab has the quant > > > > > > function, > > > > > > https://www.mathworks.com/help/deeplearning/ref/quant.html wh > > > > > > ich > > > > > > basically > > > > > > supports arbitrary bases (as a special case of an even more > > > > > > general > > > > > > approach). So there may be other use cases (although the > > > > > > example > > > > > > basically > > > > > > just implements around(x, 1)). > > > > > > > > > > > > > > > To be honest, hearing hardware design and data compression does > > > > > make me > > > > > lean towards it not being mainstream enough that inclusion in > > > > > NumPy > > > > > really makes sense. But happy to hear opposing opinions. > > > > > > > > > > > > > Here I can easily argue that "all" computations are limited by > > > > finite > > > > word > > > > length and as soon as you want to see the effect of any type of > > > > format not > > > > supported out of the box, it will be beneficial. (Strictly, it > > > > makes > > > > more > > > > sense to quantize to a given number of bits than a given number > > > > of > > > > decimal > > > > digits, as we cannot represent most of those exactly.) But I may > > > > not > > > > do > > > > that. > > > > > > > > > > > > > It would be nice to have more of a culture around ufuncs that > > > > > do > > > > > not > > > > > live in NumPy. (I suppose at some point it was more difficult > > > > > to > > > > > do C- > > > > > extension, but that is many years ago). > > > > > > > > > > > > > I do agree with this though. And this got me realizing that maybe > > > > what I > > > > actually would like to do is to create an array-library with > > > > fully > > > > customizable (numeric) data types instead. That is, sort of, the > > > > proper way > > > > to do it, although the proposed approach is indeed simpler and in > > > > most > > > > cases will work well enough. > > > > > > > > (Am I right in believing that it is not that easy to piggy-back > > > > custom data > > > > types onto NumPy arrays? Something different from using object as > > > > dtype or > > > > the "struct-like" custom approach using the existing scalar > > > > types.) > > > > > > NumPy is pretty much fully customizeable (beyond just numeric data > > > types). > > > Admittedly, to not have weird edge cases and have more power you > > > have > > > to use the new API (NEP 41-43 [1]) and that is "experimental" and > > > may > > > have some holes. > > > "Experimental" doesn't mean it is expected to change significantly, > > > just that you can't ship your stuff broadly really. > > > > > > The holes may matter for some complicated dtypes (custom memory > > > allocation, parametric...). But at this point many should be rather > > > fixable, so before you do your own give NumPy a chance? > > > > > > - Sebastian > > > > > > > > > [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html > > > > > > > > > > > BR Oscar Gustafsson > > > > _______________________________________________ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: oscar.gustafs...@gmail.com > > > > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: sebast...@sipsolutions.net > > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: greg.m.lu...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com