[ https://issues.apache.org/jira/browse/ARROW-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906196#comment-16906196 ]
Joris Van den Bossche commented on ARROW-6222: ---------------------------------------------- > Is there currently any way to serialize a dataframe that contains vectors > (1d tensors) with feather? I don't think so. At least not at the moment, as feather does not support a list type. But see ARROW-5510 for future plans to at some point be updated to the arrow IPC format (which will then support list types). At the moment, I think parquet is the better alternative if you want (simple, non-nested) list support in file format supported by Arrow. > Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: > list<item: float>` > ---------------------------------------------------------------------------------------- > > Key: ARROW-6222 > URL: https://issues.apache.org/jira/browse/ARROW-6222 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 0.14.1 > Reporter: Marcel Ackermann > Priority: Major > > I want to serialize pytorch tensors, but as they are not implemented in arrow > yet I convert them to a numpy array like this: {{t.numpy()}} > ([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)] > which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a > 1-dimensional ndarray. > Calling {{df.to_feather("fname.feather")}} yields > {{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}. > Next I tried {{pyarrow.array(t.numpy())}} which results in > {{pyarrow.lib.ArrowInvalid: ('Could not convert [\n 0.00500498,\n > -0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python > value type when inferring an Arrow data type', 'Conversion failed for column > 0 with type object')}}. > I would appreciate if this would work more out-of-the-box. > Upon request a full example: > {code:python} > import torch > import pyarrow > import pandas as pd > pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather") > pd.DataFrame([[torch.ones(2).numpy()]], > columns=["0"]).to_feather("fname.feather") > pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], > columns=["0"]).to_feather("fname.feather") > {code} > {code:python} > ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not > recognize Python value type when inferring an Arrow data type', 'Conversion > failed for column 0 with type object') > ArrowNotImplementedError: list<item: float> > ArrowInvalid: ('Could not convert [\n 1,\n 1\n] with type > pyarrow.lib.FloatArray: did not recognize Python value type when inferring an > Arrow data type', 'Conversion failed for column 0 with type object') > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)