AlenkaF commented on issue #41625: URL: https://github.com/apache/arrow/issues/41625#issuecomment-2120014321
Hi, thank you for opening an issue @djouallah! I have been able to reproduce on my dev environment. For next time, it will be much easier to help if you present a simple reproducible example. The google colab you have linked has lots (lots!) of code not connected to the issue and I was very reluctant at first to download files and manipulate them but did so after taking time and checking the source and all the code. Also, the possibility to actually get an answer on your issue will be higher with a simple example ;) Here is a on I created that shows the issue: ```python >>> import pyarrow as pa >>> data = {'UNIT': ["DUNIT", "DUNIT", "DUNIT", "DUNIT"], ... 'version' : [1, 1, 3, 3]} >>> df = pd.DataFrame(data) >>> df.index = df['version'] >>> df.columns.name = np.int64(142564) ------> The issue is here, numpy int64 column index name >>> df 142564 UNIT version version 1 DUNIT 1 1 DUNIT 1 3 DUNIT 3 3 DUNIT 3 >>> pa.Table.from_pandas(df) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/table.pxi", line 4559, in pyarrow.lib.Table.from_pandas arrays, schema, n_rows = dataframe_to_arrays( File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 635, in dataframe_to_arrays pandas_metadata = construct_metadata( ^^^^^^^^^^^^^^^^^^^ File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 257, in construct_metadata b'pandas': json.dumps({ ^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 200, in encode chunks = self.iterencode(o, _one_shot=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 258, in iterencode return _iterencode(o, 0) ^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 180, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type int64 is not JSON serializable ``` The code worked if I remove the column name ```python >>> df.columns.name = None >>> pa.Table.from_pandas(df) pyarrow.Table UNIT: string version: int64 __index_level_0__: int64 ---- UNIT: [["DUNIT","DUNIT","DUNIT","DUNIT"]] version: [[1,1,3,3]] __index_level_0__: [[1,1,3,3]] ``` It would have also worked if python int type would have been used instead of `numpy.int64`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org