[ 
https://issues.apache.org/jira/browse/ARROW-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Ackermann updated ARROW-6222:
------------------------------------
    Description: 
I want to serialize pytorch tensors, but as they are not implemented in arrow 
yet I convert them to a numpy array like this: {{t.numpy()}} 
([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)]
 which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 
1-dimensional ndarray.

Calling {{df.to_feather("fname.feather")}} yields 
{{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}.

Next I tried {{pyarrow.array(t.numpy())}} which results in 
{{pyarrow.lib.ArrowInvalid: ('Could not convert [\n  0.00500498,\n  
-0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python 
value type when inferring an Arrow data type', 'Conversion failed for column 0 
with type object')}}.

I would appreciate if this would work more out-of-the-box.

Upon request a full example:
{code:python}
import torch
import pyarrow
import pandas as pd
pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather")
pd.DataFrame([[torch.ones(2).numpy()]], 
columns=["0"]).to_feather("fname.feather")
pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], 
columns=["0"]).to_feather("fname.feather")
{code}


{code:python}
ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not 
recognize Python value type when inferring an Arrow data type', 'Conversion 
failed for column 0 with type object')
ArrowNotImplementedError: list<item: float>
ArrowInvalid: ('Could not convert [\n  1,\n  1\n] with type 
pyarrow.lib.FloatArray: did not recognize Python value type when inferring an 
Arrow data type', 'Conversion failed for column 0 with type object')
{code}

  was:
I want to serialize pytorch tensors, but as they are not implemented in arrow 
yet I convert them to a numpy array like this: {{t.numpy()}} 
([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)]
 which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 
1-dimensional ndarray.

Calling {{df.to_feather("fname.feather")}} yields 
{{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}.

Next I tried {{pyarrow.array(t.numpy())}} which results in 
{{pyarrow.lib.ArrowInvalid: ('Could not convert [\n  0.00500498,\n  
-0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python 
value type when inferring an Arrow data type', 'Conversion failed for column 0 
with type object')}}.

I would appreciate if this would work more out-of-the-box.



> Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: 
> list<item: float>`
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-6222
>                 URL: https://issues.apache.org/jira/browse/ARROW-6222
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.14.1
>            Reporter: Marcel Ackermann
>            Priority: Major
>
> I want to serialize pytorch tensors, but as they are not implemented in arrow 
> yet I convert them to a numpy array like this: {{t.numpy()}} 
> ([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)]
>  which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 
> 1-dimensional ndarray.
> Calling {{df.to_feather("fname.feather")}} yields 
> {{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}.
> Next I tried {{pyarrow.array(t.numpy())}} which results in 
> {{pyarrow.lib.ArrowInvalid: ('Could not convert [\n  0.00500498,\n  
> -0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python 
> value type when inferring an Arrow data type', 'Conversion failed for column 
> 0 with type object')}}.
> I would appreciate if this would work more out-of-the-box.
> Upon request a full example:
> {code:python}
> import torch
> import pyarrow
> import pandas as pd
> pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[torch.ones(2).numpy()]], 
> columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], 
> columns=["0"]).to_feather("fname.feather")
> {code}
> {code:python}
> ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not 
> recognize Python value type when inferring an Arrow data type', 'Conversion 
> failed for column 0 with type object')
> ArrowNotImplementedError: list<item: float>
> ArrowInvalid: ('Could not convert [\n  1,\n  1\n] with type 
> pyarrow.lib.FloatArray: did not recognize Python value type when inferring an 
> Arrow data type', 'Conversion failed for column 0 with type object')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to