AlenkaF commented on issue #41625:
URL: https://github.com/apache/arrow/issues/41625#issuecomment-2120014321

   Hi, thank you for opening an issue @djouallah!
   
   I have been able to reproduce on my dev environment. For next time, it will 
be much easier to help if you present a simple reproducible example. The google 
colab you have linked has lots (lots!) of code not connected to the issue and I 
was very reluctant at first to download files and manipulate them but did so 
after taking time and checking the source and all the code.
   Also, the possibility to actually get an answer on your issue will be higher 
with a simple example ;)
   
   Here is a on I created that shows the issue:
   
   ```python
   >>> import pyarrow as pa
   
   >>> data = {'UNIT': ["DUNIT", "DUNIT", "DUNIT", "DUNIT"],
   ...         'version'   : [1, 1, 3, 3]}
   >>> df = pd.DataFrame(data)
   >>> df.index = df['version']
   >>> df.columns.name = np.int64(142564)     ------> The issue is here, numpy 
int64 column index name
   >>> df
   142564    UNIT  version
   version                
   1        DUNIT        1
   1        DUNIT        1
   3        DUNIT        3
   3        DUNIT        3
   
   >>> pa.Table.from_pandas(df)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/table.pxi", line 4559, in pyarrow.lib.Table.from_pandas
       arrays, schema, n_rows = dataframe_to_arrays(
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
635, in dataframe_to_arrays
       pandas_metadata = construct_metadata(
                         ^^^^^^^^^^^^^^^^^^^
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
257, in construct_metadata
       b'pandas': json.dumps({
                  ^^^^^^^^^^^^
     File 
"/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py",
 line 231, in dumps
       return _default_encoder.encode(obj)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
 line 200, in encode
       chunks = self.iterencode(o, _one_shot=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
 line 258, in iterencode
       return _iterencode(o, 0)
              ^^^^^^^^^^^^^^^^^
     File 
"/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
 line 180, in default
       raise TypeError(f'Object of type {o.__class__.__name__} '
   TypeError: Object of type int64 is not JSON serializable
   ```
   
   The code worked if I remove the column name
   
   ```python
   >>> df.columns.name = None
   >>> pa.Table.from_pandas(df)
   pyarrow.Table
   UNIT: string
   version: int64
   __index_level_0__: int64
   ----
   UNIT: [["DUNIT","DUNIT","DUNIT","DUNIT"]]
   version: [[1,1,3,3]]
   __index_level_0__: [[1,1,3,3]]
   ```
   
   It would have also worked if python int type would have been used instead of 
`numpy.int64`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to