[ 
https://issues.apache.org/jira/browse/ARROW-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906196#comment-16906196
 ] 

Joris Van den Bossche commented on ARROW-6222:
----------------------------------------------

>  Is there currently any way to serialize a dataframe that contains vectors 
> (1d tensors) with feather?

I don't think so. At least not at the moment, as feather does not support a 
list type. But see ARROW-5510 for future plans to at some point be updated to 
the arrow IPC format (which will then support list types).
At the moment, I think parquet is the better alternative if you want (simple, 
non-nested) list support in file format supported by Arrow.

> Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: 
> list<item: float>`
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-6222
>                 URL: https://issues.apache.org/jira/browse/ARROW-6222
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.14.1
>            Reporter: Marcel Ackermann
>            Priority: Major
>
> I want to serialize pytorch tensors, but as they are not implemented in arrow 
> yet I convert them to a numpy array like this: {{t.numpy()}} 
> ([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)]
>  which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 
> 1-dimensional ndarray.
> Calling {{df.to_feather("fname.feather")}} yields 
> {{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}.
> Next I tried {{pyarrow.array(t.numpy())}} which results in 
> {{pyarrow.lib.ArrowInvalid: ('Could not convert [\n  0.00500498,\n  
> -0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python 
> value type when inferring an Arrow data type', 'Conversion failed for column 
> 0 with type object')}}.
> I would appreciate if this would work more out-of-the-box.
> Upon request a full example:
> {code:python}
> import torch
> import pyarrow
> import pandas as pd
> pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[torch.ones(2).numpy()]], 
> columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], 
> columns=["0"]).to_feather("fname.feather")
> {code}
> {code:python}
> ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not 
> recognize Python value type when inferring an Arrow data type', 'Conversion 
> failed for column 0 with type object')
> ArrowNotImplementedError: list<item: float>
> ArrowInvalid: ('Could not convert [\n  1,\n  1\n] with type 
> pyarrow.lib.FloatArray: did not recognize Python value type when inferring an 
> Arrow data type', 'Conversion failed for column 0 with type object')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to