lukaswenzl-akur8 commented on issue #44048:
URL: https://github.com/apache/arrow/issues/44048#issuecomment-2343095955
Thanks for your quick answer and insights!
You are right that this is an extreme edge case that is rare, but we want to
avoid crashes.
For now we could use the workaround to convert to strings.
schematically:
```python
if (np.sum(df["float_gran"].cat.categories.str.len()) > 2_147_483_647):
df["float_gran"] = df["float_gran"].astype(str)
#...works
table.to_pandas().astype("category")
```
this comes with a large performance penalty for the conversions but at least
doesn't crash and only affects the edgecase.
Building the whole schema each time could be prone to errors for our more
general use case.
It is great to know that the upcoming pandas version may solve this. We will
retest with pandas 3.0!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]