+1 (binding)
On Tue, 7 Mar 2023 at 23:35, David Li <lidav...@apache.org> wrote: > > +1 (binding) > > Just one comment, though: since we also define a separate "Tensor" IPC > structure in Arrow, maybe we should state the relationship somewhere in the > documentation? (Even if the answer is "no relationship".) > > On Mon, Mar 6, 2023, at 18:58, Rok Mihevc wrote: > > +1 > > > > Thanks for the discussion everyone! > > > > Rok > > > > On Mon, Mar 6, 2023 at 8:29 PM Dewey Dunnington > > <de...@voltrondata.com.invalid> wrote: > > > >> +1 (non-binding)! > >> > >> On Mon, Mar 6, 2023 at 9:59 AM Nic Crane <thisis...@gmail.com> wrote: > >> > >> > +1 > >> > > >> > On Mon, 6 Mar 2023 at 12:41, Alenka Frim <ale...@voltrondata.com > >> .invalid> > >> > wrote: > >> > > >> > > Hi all, > >> > > > >> > > I am starting a new voting thread with this email as the first voting > >> > > thread [1] opened up new > >> > > comments and suggestions and we wanted to take time to see how that > >> > > evolves. > >> > > > >> > > *I would like to propose we vote on adding the fixed shape tensor > >> > canonical > >> > > extension type* > >> > > *with the following specification:* > >> > > > >> > > Fixed shape tensor > >> > > ================== > >> > > > >> > > * Extension name: `arrow.fixed_shape_tensor`. > >> > > > >> > > * The storage type of the extension: ``FixedSizeList`` where: > >> > > > >> > > * **value_type** is the data type of individual tensor elements. > >> > > * **list_size** is the product of all the elements in tensor shape. > >> > > > >> > > * Extension type parameters: > >> > > > >> > > * **value_type** = the Arrow data type of individual tensor elements. > >> > > * **shape** = the physical shape of the contained tensors > >> > > as an array. > >> > > > >> > > Optional parameters describing the logical layout: > >> > > > >> > > * **dim_names** = explicit names to tensor dimensions > >> > > as an array. The length of it should be equal to the shape > >> > > length and equal to the number of dimensions. > >> > > > >> > > ``dim_names`` can be used if the dimensions have well-known > >> > > names and they map to the physical layout (row-major). > >> > > > >> > > * **permutation** = indices of the desired ordering of the > >> > > original dimensions, defined as an array. > >> > > > >> > > The indices contain a permutation of the values [0, 1, .., N-1] > >> where > >> > > N is the number of dimensions. The permutation indicates which > >> > > dimension of the logical layout corresponds to which dimension of > >> the > >> > > physical tensor (the i-th dimension of the logical view corresponds > >> > > to the dimension with number ``permutations[i]`` of the physical > >> > > tensor). > >> > > > >> > > Permutation can be useful in case the logical order of > >> > > the tensor is a permutation of the physical order (row-major). > >> > > > >> > > When logical and physical layout are equal, the permutation will > >> > always > >> > > be ([0, 1, .., N-1]) and can therefore be left out. > >> > > > >> > > * Description of the serialization: > >> > > > >> > > The metadata must be a valid JSON object including shape of > >> > > the contained tensors as an array with key **"shape"** plus optional > >> > > dimension names with keys **"dim_names"** and ordering of the > >> > > dimensions with key **"permutation"**. > >> > > > >> > > - Example: ``{ "shape": [2, 5]}`` > >> > > - Example with ``dim_names`` metadata for NCHW ordered data: > >> > > > >> > > ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}`` > >> > > > >> > > - Example of permuted 3-dimensional tensor: > >> > > > >> > > ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}`` > >> > > > >> > > This is the physical layout shape and the the shape of the logical > >> > > layout would in this case be ``[500, 100, 200]``. > >> > > > >> > > .. note:: > >> > > > >> > > Elements in a fixed shape tensor extension array are stored > >> > > in row-major/C-contiguous order. > >> > > > >> > > * The specification is submitted as a PR [2] to Canonical Extension > >> Types > >> > > document under the > >> > > format specifications directory [3]. > >> > > > >> > > There are also two implementations submitted to Apache Arrow > >> repository: > >> > > * C++ implementation of the proposed specification [4] > >> > > * Python example implementation of the proposed specification and usage > >> > > (only illustrative) [5] > >> > > > >> > > > >> > > The vote will be open for at least 72 hours. > >> > > > >> > > [ ] +1 Accept this proposal > >> > > [ ] +0 > >> > > [ ] -1 Do not accept this proposal because... > >> > > > >> > > > >> > > Regards, Alenka > >> > > > >> > > [1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd > >> > > [2]: https://github.com/apache/arrow/pull/33925/files > >> > > [3]: > >> > > > >> > > > >> > > >> https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst > >> > > > >> > > [4]: https://github.com/apache/arrow/pull/8510/files > >> > > [5]: https://github.com/apache/arrow/pull/33948/files > >> > > > >> > > >>