No problem Kevin. Thank you for sharing the information with your colleagues. All comments are much appreciated.
As there were no additional comments/suggestions to the spec itself, I will open up another voting thread today. Thanks all! Alenka On Tue, Feb 28, 2023 at 11:11 AM Kevin Gurney <kgur...@mathworks.com> wrote: > Hi Alenka, > > Thank you. I've informed my colleagues at MathWorks to add any further > comments to the PR. > > My apologies for bringing this up on the voting thread. > > Best Regards, > > Kevin Gurney > > ________________________________ > From: Alenka Frim <ale...@voltrondata.com.INVALID> > Sent: Tuesday, February 28, 2023 4:19 AM > To: dev@arrow.apache.org <dev@arrow.apache.org> > Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type > > This was actually already meant as the voting thread, but given it sparked > some more discussion, let's give this a few more days, and then re-start > with a new vote thread. > > *So if someone still has comments on the current text, please bring those > up here or in the PR*: https://github.com/apache/arrow/pull/33925< > https://github.com/apache/arrow/pull/33925>. > > Alenka > > On Fri, Feb 24, 2023 at 10:15 AM Kevin Gurney <kgur...@mathworks.com> > wrote: > > > Hi All, > > > > Thank you very much for creating this proposal, Alenka! > > > > I noticed the following in the notes [1] shared from the February 15th > > Arrow Community Meeting: > > > > "Members of Hugging Face, Ray, and PyTorch community have given input and > > some of it was incorporated - It would be good to have input from some > > other companies and project communities including Lance, NumPy, Posit, > > MATLAB, DLPack, CUDA/RAPIDS, Arrow Rust, Xarray, Julia, Fortran, > > TensorFlow, LinkedIn" > > > > Based on the inclusion of MATLAB in the list above, I've shared this > > proposal with some colleagues at MathWorks who have expertise in the deep > > learning area. They will respond here if they have any additional input > to > > add. > > > > That being said, I recognize that this proposal is already nearing the > > voting phase. > > > > [1] https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn< > https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn> > > > > Best Regards, > > > > Kevin Gurney > > > > ________________________________ > > From: Rok Mihevc <rok.mih...@gmail.com> > > Sent: Thursday, February 23, 2023 8:12 AM > > To: dev@arrow.apache.org <dev@arrow.apache.org> > > Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type > > > > That makes sense indeed. > > Do we have any more comments on the language of the proposal [1] or > should > > we proceed to vote? > > > > Rok > > > > [1] https://github.com/apache/arrow/pull/33925/files< > https://github.com/apache/arrow/pull/33925/files>< > > https://github.com/apache/arrow/pull/33925/files< > https://github.com/apache/arrow/pull/33925/files>> > > > > On Wed, Feb 22, 2023 at 2:13 PM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > That's a good point. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 22/02/2023 à 14:11, Dewey Dunnington a écrit : > > > > I don't think having both dimension names and permutation is > > > > redundant...dimension names can also serve as human-readable tags > that > > > help > > > > a human interpret the values. If reading a NetCDF, for example, one > > might > > > > store the dimension variable names. When determining type equality it > > may > > > > be useful that {..., permutation = [2, 0, 1], dim_names = ["C", "H", > > > "W"]} > > > > is not equal to {..., permutation = [2, 0, 1], dim_names = ["x", "y", > > > "z"]}. > > > > > > > > On Wed, Feb 22, 2023 at 4:56 AM Rok Mihevc <rok.mih...@gmail.com> > > wrote: > > > > > > > >>> > > > >>>>> > > > >>>>> Should we rule that `dim_names` and `permutation` are mutually > > > >>> exclusive? > > > >>>>> > > > >>>> > > > >>>> Since `dim_names` have to "map to the physical layout (row-major)" > > > that > > > >>>> means permutation will always be trivial which indeed makes it > > > >>> unnecessary > > > >>>> to store both. > > > >>> > > > >>> I don't think it is necessarily needed to explicitly make them > > > >>> mutually exclusive. I don't know how useful this would in practice, > > > >>> but you certainly *can* specify both in a meaningful way. Re-using > > the > > > >>> example of NHWC data, which is physically stored as NCHW, you can > > keep > > > >>> track of this by specifying a permutation of [2, 0, 1], but at the > > > >>> same time you could also still save the dimension names as ["C", > "H", > > > >>> "W"]. > > > >>> > > > >> > > > >> I'll advocate for the original comment, but I'm ok either way. > Having > > > both > > > >> `dim_names` and `permutation` is redundant - if the user knows their > > > >> desired order of `dim_names` they can derive the permutation. If > they > > > don't > > > >> use `dim_names` they probably don't want them. > > > >> > > > > > > > > > >