Antoine Pitrou created ARROW-2913: ------------------------------------- Summary: [Python] Exported buffers don't expose type information Key: ARROW-2913 URL: https://issues.apache.org/jira/browse/ARROW-2913 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Affects Versions: 0.10.0 Reporter: Antoine Pitrou
Using the {{buffers()}} method on array gives you a list of buffers backing the array, but those buffers lose typing information: {code:python} >>> a = pa.array(range(10)) >>> a.type DataType(int64) >>> buffers = a.buffers() >>> [(memoryview(buf).format, memoryview(buf).shape) for buf in buffers] [('b', (2,)), ('b', (80,))] {code} Conversely, Numpy exposes type information in the Python buffer protocol: {code:python} >>> a = pa.array(range(10)) >>> memoryview(a.to_numpy()).format 'l' >>> memoryview(a.to_numpy()).shape (10,) {code} Exposing type information on buffers could be important for third-party systems, such as Dask/distributed, for type-based data compression when serializing. Since our C++ buffers are not typed, it's not obvious how to solve this. Should we return tensors instead? -- This message was sent by Atlassian JIRA (v7.6.3#76005)