Yunbo Deng created ARROW-17077:
----------------------------------
Summary: Unicode character issue with pyarrow
Key: ARROW-17077
URL: https://issues.apache.org/jira/browse/ARROW-17077
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Yunbo Deng
When running code using databricks SQL connector for Python, it hit a unicode
character issue in pyarrow library. The customer has to put a workaround in the
client code, something like
"SELECT decode(string(unbase64(value)), 'utf8')"
Exception in the main script No data fetched using SQL-statement: SELECT * FROM
parquet.`abfss://[email protected]/structXXXXXXX`. Exception: Unknown error:
Wrapping TP H�kan Sweater failed Traceback (most recent call last):
File "/home/xxxx/yy/allo/yy/db/sql_reader.py", line 53, in query rows =
cursor.fetchmany(self.MAX_ROWS)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py",
line 401, in fetchmany return self.active_result_set.fetchmany(size)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py",
line 630, in fetchmany return
self._convert_arrow_table(self.fetchmany_arrow(size))
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py",
line 563, in _convert_arrow_table df = table_renamed.to_pandas(
File "pyarrow/array.pxi", line 822, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 3889, in pyarrow.lib.Table._to_pandas
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py",
line 803, in table_to_blockmanager blocks = _table_to_blocks(options,
table, categories, ext_columns_dtypes)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py",
line 1155, in _table_to_blocks return [_reconstruct_block(item, columns,
extension_columns)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py",
line 1155, in <listcomp> return [_reconstruct_block(item, columns,
extension_columns)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py",
line 763, in _reconstruct_block pd_ext_arr =
pandas_dtype.__from_arrow__(arr)
File
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pandas/core/arrays/string_.py",
line 217, in __from_arrow__ str_arr =
StringArray._from_sequence(np.array(arr))
File "pyarrow/array.pxi", line 1395, in pyarrow.lib.Array.__array__
File "pyarrow/array.pxi", line 1441, in pyarrow.lib.Array.to_numpy
File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status
pyarrow.lib.ArrowException: Unknown error: Wrapping TP H�kan Sweater failed
During handling of the above exception, another exception occurred: Traceback
(most recent call last):
--
This message was sent by Atlassian Jira
(v8.20.10#820010)