Re: PyArrow python list to numpy nd.array inference in pd.read_table

simba nyatsanga Thu, 18 Jan 2018 14:43:24 -0800

Great, thank you for the explanation - it makes so much sense. I have a use
case where once I've converted an Arrow table back to pandas I then convert
it into a dictionary (with to_dict()). This dictionary then gets JSON
serialised and sent over the wire for display on the client side. I
encountered the behaviour when the JSON serialisation was failing for an
ndarray.


I think in addition to the performance/efficiency considerations you
mentioned, there isn't a strong need for the list option (atleast for me).
I will handle such data types at the application level.

Thanks.

On Thu, 18 Jan 2018 at 23:01 Wes McKinney <wesmck...@gmail.com> wrote:

> Upon converting to Arrow, the information about whether the original
> input was a list or ndarray was lost. So any kind of sequence ends up
> as an Arrow List<T> type.
>
> When converting back to pandas, we could return either a list or an
> ndarray. Returning ndarray is faster and much more memory efficient;
> producing lists would require creating a lot of Python objects.
>
> Hypothetically, we could add an option to return lists instead of
> ndarrays if there were a strong enough need.
>
> - Wes
>
> On Thu, Jan 18, 2018 at 2:10 PM, simba nyatsanga <simnyatsa...@gmail.com>
> wrote:
> > Hi Wes,
> >
> > Great! Thanks for the pointer. From what I gather this is a fundamental
> and
> > deliberate design decision. Would I be correct in saying the memory
> > footprint and access speed of a NumPy array compared to that of a Python
> > list is the reason why the conversion is done?
> >
> > Kind Regards
> > Simba
> >
> > On Thu, 18 Jan 2018 at 20:35 Wes McKinney <wesmck...@gmail.com> wrote:
> >
> >> hi Simba,
> >>
> >> Yes -- Arrow list<T> types are converted to NumPy arrays when converting
> >> back to pandas with to_pandas(...). This conversion happens in C++ code
> in
> >>
> >>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L541
> >>
> >> - Wes
> >>
> >> On Thu, Jan 18, 2018 at 1:26 PM, simba nyatsanga <
> simnyatsa...@gmail.com>
> >> wrote:
> >>
> >> > Good day everyone,
> >> >
> >> > I noticed what looks like type inference happening after persisting a
> >> > pandas DataFrame where one of the column values is a list. When I
> load up
> >> > the DataFrame again and do df.to_dict(), the value is no longer a list
> >> but
> >> > a numpy array. I dug through functions in the pandas_compat.py to try
> and
> >> > figure out at what point the dtype is being applied for that value.
> >> >
> >> > I'd like to verify if this is the intended behaviour.
> >> >
> >> > Here's an illustration of the behaviour:
> >> >
> >> > [image: Screen Shot 2018-01-18 at 15.54.59.png]
> >> >
> >> > Kind Regards
> >> > Simba
> >> >
> >>
>

Re: PyArrow python list to numpy nd.array inference in pd.read_table

Reply via email to