Great, thank you for the explanation - it makes so much sense. I have a use case where once I've converted an Arrow table back to pandas I then convert it into a dictionary (with to_dict()). This dictionary then gets JSON serialised and sent over the wire for display on the client side. I encountered the behaviour when the JSON serialisation was failing for an ndarray.
I think in addition to the performance/efficiency considerations you mentioned, there isn't a strong need for the list option (atleast for me). I will handle such data types at the application level. Thanks. On Thu, 18 Jan 2018 at 23:01 Wes McKinney <wesmck...@gmail.com> wrote: > Upon converting to Arrow, the information about whether the original > input was a list or ndarray was lost. So any kind of sequence ends up > as an Arrow List<T> type. > > When converting back to pandas, we could return either a list or an > ndarray. Returning ndarray is faster and much more memory efficient; > producing lists would require creating a lot of Python objects. > > Hypothetically, we could add an option to return lists instead of > ndarrays if there were a strong enough need. > > - Wes > > On Thu, Jan 18, 2018 at 2:10 PM, simba nyatsanga <simnyatsa...@gmail.com> > wrote: > > Hi Wes, > > > > Great! Thanks for the pointer. From what I gather this is a fundamental > and > > deliberate design decision. Would I be correct in saying the memory > > footprint and access speed of a NumPy array compared to that of a Python > > list is the reason why the conversion is done? > > > > Kind Regards > > Simba > > > > On Thu, 18 Jan 2018 at 20:35 Wes McKinney <wesmck...@gmail.com> wrote: > > > >> hi Simba, > >> > >> Yes -- Arrow list<T> types are converted to NumPy arrays when converting > >> back to pandas with to_pandas(...). This conversion happens in C++ code > in > >> > >> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L541 > >> > >> - Wes > >> > >> On Thu, Jan 18, 2018 at 1:26 PM, simba nyatsanga < > simnyatsa...@gmail.com> > >> wrote: > >> > >> > Good day everyone, > >> > > >> > I noticed what looks like type inference happening after persisting a > >> > pandas DataFrame where one of the column values is a list. When I > load up > >> > the DataFrame again and do df.to_dict(), the value is no longer a list > >> but > >> > a numpy array. I dug through functions in the pandas_compat.py to try > and > >> > figure out at what point the dtype is being applied for that value. > >> > > >> > I'd like to verify if this is the intended behaviour. > >> > > >> > Here's an illustration of the behaviour: > >> > > >> > [image: Screen Shot 2018-01-18 at 15.54.59.png] > >> > > >> > Kind Regards > >> > Simba > >> > > >> >