[ 
https://issues.apache.org/jira/browse/ARROW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2966.
---------------------------------
    Resolution: Fixed
      Assignee: Wes McKinney

Resolved in ARROW-2814

> [Python] Data type conversion error
> -----------------------------------
>
>                 Key: ARROW-2966
>                 URL: https://issues.apache.org/jira/browse/ARROW-2966
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>         Environment: linux
>            Reporter: Christopher Brooks
>            Assignee: Wes McKinney
>            Priority: Major
>             Fix For: 0.11.0
>
>
> I have a big pandas dataframe. I try and convert that to a pyarrow table and 
> it fails with a conversion error. Not sure if this is a bug or is expected? 
> I realize the code below showing the error is pretty useless as is. *What can 
> I do to help identify the cause in my pandas dataframe?*
> Here's the error:
>  
> {code:java}
> In [17]: pa.Table.from_pandas(df)
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-17-6eac5d0eec08> in <module>()
> ----> 1 pa.Table.from_pandas(df)
> table.pxi in pyarrow.lib.Table.from_pandas()
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
>  in dataframe_to_arrays(df, schema, preserve_index, nthreads)
> 375 arrays = list(executor.map(convert_column,
> 376 columns_to_convert,
> --> 377 convert_types))
> 378 
> 379 types = [x.type for x in arrays]
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
> 584 # Careful not to keep a reference to the popped future
> 585 if timeout is None:
> --> 586 yield fs.pop().result()
> 587 else:
> 588 yield fs.pop().result(end_time - time.time())
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
> 423 raise CancelledError()
> 424 elif self._state == FINISHED:
> --> 425 return self.__get_result()
> 426 
> 427 self._condition.wait(timeout)
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
> 382 def __get_result(self):
> 383 if self._exception:
> --> 384 raise self._exception
> 385 else:
> 386 return self._result
> ~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
> 54 
> 55 try:
> ---> 56 result = self.fn(*self.args, **self.kwargs)
> 57 except BaseException as exc:
> 58 self.future.set_exception(exc)
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
>  in convert_column(col, ty)
> 364 
> 365 def convert_column(col, ty):
> --> 366 return pa.array(col, from_pandas=True, type=ty)
> 367 
> 368 if nthreads == 1:
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Double: Got Python 
> object of type str but can only handle these types: float
> In [18]: pa.__version__
> Out[18]: '0.9.0'
> In [19]: pd.__version__
> Out[19]: '0.23.3'
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to