Upon converting to Arrow, the information about whether the original input was a list or ndarray was lost. So any kind of sequence ends up as an Arrow List<T> type.
When converting back to pandas, we could return either a list or an ndarray. Returning ndarray is faster and much more memory efficient; producing lists would require creating a lot of Python objects. Hypothetically, we could add an option to return lists instead of ndarrays if there were a strong enough need. - Wes On Thu, Jan 18, 2018 at 2:10 PM, simba nyatsanga <simnyatsa...@gmail.com> wrote: > Hi Wes, > > Great! Thanks for the pointer. From what I gather this is a fundamental and > deliberate design decision. Would I be correct in saying the memory > footprint and access speed of a NumPy array compared to that of a Python > list is the reason why the conversion is done? > > Kind Regards > Simba > > On Thu, 18 Jan 2018 at 20:35 Wes McKinney <wesmck...@gmail.com> wrote: > >> hi Simba, >> >> Yes -- Arrow list<T> types are converted to NumPy arrays when converting >> back to pandas with to_pandas(...). This conversion happens in C++ code in >> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L541 >> >> - Wes >> >> On Thu, Jan 18, 2018 at 1:26 PM, simba nyatsanga <simnyatsa...@gmail.com> >> wrote: >> >> > Good day everyone, >> > >> > I noticed what looks like type inference happening after persisting a >> > pandas DataFrame where one of the column values is a list. When I load up >> > the DataFrame again and do df.to_dict(), the value is no longer a list >> but >> > a numpy array. I dug through functions in the pandas_compat.py to try and >> > figure out at what point the dtype is being applied for that value. >> > >> > I'd like to verify if this is the intended behaviour. >> > >> > Here's an illustration of the behaviour: >> > >> > [image: Screen Shot 2018-01-18 at 15.54.59.png] >> > >> > Kind Regards >> > Simba >> > >>