[
https://issues.apache.org/jira/browse/ARROW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575230#comment-16575230
]
Wes McKinney commented on ARROW-2966:
-------------------------------------
[~brooksch] in the next version of pyarrow (this didn't quite make it into
0.10.0), the exception will show you the offending value and its data type
> [Python] Data type conversion error
> -----------------------------------
>
> Key: ARROW-2966
> URL: https://issues.apache.org/jira/browse/ARROW-2966
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Environment: linux
> Reporter: Christopher Brooks
> Priority: Major
> Fix For: 0.11.0
>
>
> I have a big pandas dataframe. I try and convert that to a pyarrow table and
> it fails with a conversion error. Not sure if this is a bug or is expected?
> I realize the code below showing the error is pretty useless as is. *What can
> I do to help identify the cause in my pandas dataframe?*
> Here's the error:
>
> {code:java}
> In [17]: pa.Table.from_pandas(df)
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-17-6eac5d0eec08> in <module>()
> ----> 1 pa.Table.from_pandas(df)
> table.pxi in pyarrow.lib.Table.from_pandas()
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
> in dataframe_to_arrays(df, schema, preserve_index, nthreads)
> 375 arrays = list(executor.map(convert_column,
> 376 columns_to_convert,
> --> 377 convert_types))
> 378
> 379 types = [x.type for x in arrays]
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
> 584 # Careful not to keep a reference to the popped future
> 585 if timeout is None:
> --> 586 yield fs.pop().result()
> 587 else:
> 588 yield fs.pop().result(end_time - time.time())
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
> 423 raise CancelledError()
> 424 elif self._state == FINISHED:
> --> 425 return self.__get_result()
> 426
> 427 self._condition.wait(timeout)
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
> 382 def __get_result(self):
> 383 if self._exception:
> --> 384 raise self._exception
> 385 else:
> 386 return self._result
> ~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
> 54
> 55 try:
> ---> 56 result = self.fn(*self.args, **self.kwargs)
> 57 except BaseException as exc:
> 58 self.future.set_exception(exc)
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
> in convert_column(col, ty)
> 364
> 365 def convert_column(col, ty):
> --> 366 return pa.array(col, from_pandas=True, type=ty)
> 367
> 368 if nthreads == 1:
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Double: Got Python
> object of type str but can only handle these types: float
> In [18]: pa.__version__
> Out[18]: '0.9.0'
> In [19]: pd.__version__
> Out[19]: '0.23.3'
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)