>
> OK, more below.  But unfortunately `int2` and `int4` *are* problematic,
> because the NumPy array uses a byte-sized strided layout, so you would
> have to store them in a full byte, which is probably not what you want.


> I am always thinking of adding a provision for it in the DTypes so that
> someone could use part of the NumPy machine to make an array that can
> have non-byte sized strides, but the NumPy array itself is ABI
> incompatible with storing these packed :(.



(I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
> but it would still have to take 1-byte storage space when put into a
> NumPy array, so I am not sure there is much of a point.)




I have also been curious about the new DTypes mechanism and whether we
could do non byte-size DTypes with it. One use-case I have specifically is
for reading and writing non byte-aligned data [1]. So, this would work very
well for that use-case if the dtype knew how to read/write the
proper bit-size. For my use-case I wouldn't care too much if internally
Numpy needs to expand and store the data as full bytes, but being able to
read a bitwise binary stream into Numpy native dtypes for further
processing would be useful I think (without having to resort to unpackbits
and do rearranging/packing to other types).

dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', 'uint5')}
# x would have two unsigned ints, but reading only one byte from the stream
x = np.frombuffer(buffer, dtype)
# would be ideal to get tobytes() to know how to pack a uint3+uint5 DType
into a single byte as well
x.tobytes()

Greg

[1] Specifically, this is for very low bandwidth satellite data where we
try to pack as much information in the downlink and use every bit of space,
but once on the ground I can expand the bit-size fields to byte-size fields
without too much issue of worrying about space [puns intended].


On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg <sebast...@sipsolutions.net>
wrote:

> On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
> > Thanks! That does indeed look like a promising approach! And for sure
> > it
> > would be better to avoid having to reimplement the whole array-part
> > and
> > only focus on the data types. (If successful, my idea of a project
> > would
> > basically solve all the custom numerical types discussed, bfloat16,
> > int2,
> > int4 etc.)
>
> OK, more below.  But unfortunately `int2` and `int4` *are* problematic,
> because the NumPy array uses a byte-sized strided layout, so you would
> have to store them in a full byte, which is probably not what you want.
>
> I am always thinking of adding a provision for it in the DTypes so that
> someone could use part of the NumPy machine to make an array that can
> have non-byte sized strides, but the NumPy array itself is ABI
> incompatible with storing these packed :(.
>
> (I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
> but it would still have to take 1-byte storage space when put into a
> NumPy array, so I am not sure there is much of a point.)
>
> >
> > I understand that the following is probably a hard question to
> > answer, but
> > is it expected that there will be work done on this in the "near"
> > future
> > to fill any holes and possibly become more stable? For context, the
> > current
> > plan on my side is to propose this as a student project for the
> > spring, so
> > primarily asking for planning and describing the project a bit
> > better.
>
>
> Well, it depends on what you need.  With the exception above, I doubt
> the "holes" will matter much practice unless you are targeting for a
> polished release rather than experimentation.
> But of course it may be that you run into something that is important
> for you, but doesn't yet quite work.
>
> I will note just dealing with the Python/NumPy C-API can be a fairly
> steep learning curve, so you need someone comfortable to dive in and
> budget a good amount of time for that part.
> And yes, this is pretty new, so there may be stumbling stones (which I
> am happy to discuss in NumPy issues or directly).
>
> - Sebastian
>
>
> >
> > BR Oscar
> >
> > Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg <
> > sebast...@sipsolutions.net>:
> >
> > > On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
> > > > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg <
> > > > sebast...@sipsolutions.net>:
> > > >
> > > > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
> > > > > > >
> > > > > > > I'm not an expert, but I never encountered rounding
> > > > > > > floating
> > > > > > > point
> > > > > > > numbers
> > > > > > > in bases different from 2 and 10.
> > > > > > >
> > > > > >
> > > > > > I agree that this is probably not very common. More a
> > > > > > possibility
> > > > > > if
> > > > > > one
> > > > > > would supply a base argument to around.
> > > > > >
> > > > > > However, it is worth noting that Matlab has the quant
> > > > > > function,
> > > > > > https://www.mathworks.com/help/deeplearning/ref/quant.html wh
> > > > > > ich
> > > > > > basically
> > > > > > supports arbitrary bases (as a special case of an even more
> > > > > > general
> > > > > > approach). So there may be other use cases (although the
> > > > > > example
> > > > > > basically
> > > > > > just implements around(x, 1)).
> > > > >
> > > > >
> > > > > To be honest, hearing hardware design and data compression does
> > > > > make me
> > > > > lean towards it not being mainstream enough that inclusion in
> > > > > NumPy
> > > > > really makes sense.  But happy to hear opposing opinions.
> > > > >
> > > >
> > > > Here I can easily argue that "all" computations are limited by
> > > > finite
> > > > word
> > > > length and as soon as you want to see the effect of any type of
> > > > format not
> > > > supported out of the box, it will be beneficial. (Strictly, it
> > > > makes
> > > > more
> > > > sense to quantize to a given number of bits than a given number
> > > > of
> > > > decimal
> > > > digits, as we cannot represent most of those exactly.)  But I may
> > > > not
> > > > do
> > > > that.
> > > >
> > > >
> > > > > It would be nice to have more of a culture around ufuncs that
> > > > > do
> > > > > not
> > > > > live in NumPy.  (I suppose at some point it was more difficult
> > > > > to
> > > > > do C-
> > > > > extension, but that is many years ago).
> > > > >
> > > >
> > > > I do agree with this though. And this got me realizing that maybe
> > > > what I
> > > > actually would like to do is to create an array-library with
> > > > fully
> > > > customizable (numeric) data types instead. That is, sort of, the
> > > > proper way
> > > > to do it, although the proposed approach is indeed simpler and in
> > > > most
> > > > cases will work well enough.
> > > >
> > > > (Am I right in believing that it is not that easy to piggy-back
> > > > custom data
> > > > types onto NumPy arrays? Something different from using object as
> > > > dtype or
> > > > the "struct-like" custom approach using the existing scalar
> > > > types.)
> > >
> > > NumPy is pretty much fully customizeable (beyond just numeric data
> > > types).
> > > Admittedly, to not have weird edge cases and have more power you
> > > have
> > > to use the new API (NEP 41-43 [1]) and that is "experimental" and
> > > may
> > > have some holes.
> > > "Experimental" doesn't mean it is expected to change significantly,
> > > just that you can't ship your stuff broadly really.
> > >
> > > The holes may matter for some complicated dtypes (custom memory
> > > allocation, parametric...). But at this point many should be rather
> > > fixable, so before you do your own give NumPy a chance?
> > >
> > > - Sebastian
> > >
> > >
> > > [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html
> > >
> > > >
> > > > BR Oscar Gustafsson
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: sebast...@sipsolutions.net
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: oscar.gustafs...@gmail.com
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: greg.m.lu...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to