Upon converting to Arrow, the information about whether the original
input was a list or ndarray was lost. So any kind of sequence ends up
as an Arrow List<T> type.

When converting back to pandas, we could return either a list or an
ndarray. Returning ndarray is faster and much more memory efficient;
producing lists would require creating a lot of Python objects.

Hypothetically, we could add an option to return lists instead of
ndarrays if there were a strong enough need.

- Wes

On Thu, Jan 18, 2018 at 2:10 PM, simba nyatsanga <simnyatsa...@gmail.com> wrote:
> Hi Wes,
>
> Great! Thanks for the pointer. From what I gather this is a fundamental and
> deliberate design decision. Would I be correct in saying the memory
> footprint and access speed of a NumPy array compared to that of a Python
> list is the reason why the conversion is done?
>
> Kind Regards
> Simba
>
> On Thu, 18 Jan 2018 at 20:35 Wes McKinney <wesmck...@gmail.com> wrote:
>
>> hi Simba,
>>
>> Yes -- Arrow list<T> types are converted to NumPy arrays when converting
>> back to pandas with to_pandas(...). This conversion happens in C++ code in
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L541
>>
>> - Wes
>>
>> On Thu, Jan 18, 2018 at 1:26 PM, simba nyatsanga <simnyatsa...@gmail.com>
>> wrote:
>>
>> > Good day everyone,
>> >
>> > I noticed what looks like type inference happening after persisting a
>> > pandas DataFrame where one of the column values is a list. When I load up
>> > the DataFrame again and do df.to_dict(), the value is no longer a list
>> but
>> > a numpy array. I dug through functions in the pandas_compat.py to try and
>> > figure out at what point the dtype is being applied for that value.
>> >
>> > I'd like to verify if this is the intended behaviour.
>> >
>> > Here's an illustration of the behaviour:
>> >
>> > [image: Screen Shot 2018-01-18 at 15.54.59.png]
>> >
>> > Kind Regards
>> > Simba
>> >
>>

Reply via email to