Hi, I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/ where Wes writes
> "string or binary data would come with additional overhead while pandas > continues to use Python objects in its memory representation" Pandas 1.0 introduced StringDType which I thought could help with the issue (I didn't check the internals, I assume they still use Python objects, just not Numpy, but I had nothing to lose). My issue is that if I create an PyArrow array with a = pa.array(["aaaaa", "bbbbb"]*100000000) and call .to_pandas() the dtype of the dataframe is still "object". I tried to add a types_mapper function (docs is not really helpful so I've simply created def mapper(t): return pd.StringDtype) but it didn't work. Is this a future feature? Would it help anything? For now I'm happy to use category/dictionary data, as the column is low cardinality and it makes it 5x faster, but I was hoping for a simpler solution. I don't know the internals but if "aaaaa" and "bbbbb" are immutable strings it shouldn't really differ from using Category type (even if it's creating python objects for them, as it can be done with 2 immutable objects). Converting compressed parquet -> pyarrow is fast (less than 10 seconds), it's pyarrow -> pandas which is slow, running for 7 minutes (so I think pyarrow already has a nice implementation) Best regards, Adam Lippai