lukaswenzl-akur8 commented on issue #44048:
URL: https://github.com/apache/arrow/issues/44048#issuecomment-2343095955

   Thanks for your quick answer and insights! 
   You are right that this is an extreme edge case that is rare, but we want to 
avoid crashes. 
   
   For now we could use the workaround to convert to strings. 
   schematically:
   ```python
   if (np.sum(df["float_gran"].cat.categories.str.len()) > 2_147_483_647):
     df["float_gran"] = df["float_gran"].astype(str)
   #...works
   table.to_pandas().astype("category")
   ```
   
   this comes with a large performance penalty for the conversions but at least 
doesn't crash and only affects the edgecase. 
   Building the whole schema each time could be prone to errors for our more 
general use case.
   
   It is great to know that the upcoming pandas version may solve this. We will 
retest with pandas 3.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to