Hello Arrow and Ibis devs,

I notice that Arrow's to_pandas method produces different types than is
expected in the Ibis test suite.


   -

   Lists are returned as numpy arrays in Arrow, but expected to be Python
   list objects in Ibis.
   -

   NULL values in integer columns are converted to NaN in Arrow, but Ibis
   expects None.


There's an argument to be made that what arrow is doing is most correct,
and certainly more performant than Python objects. I think it'd be helpful
if the Pandas, Ibis, and Arrow communities aligned on what the intended
Pandas types are for these complex values.

If not, maybe there are some more generic test utilities that we can use in
Ibis to accept numpy arrays in backend output?

Or maybe Ibis should start adopting Arrow directly, at least for complex
types? Maybe via Fletcher?

*Background:*

I very recently sent a PR to Ibis to mark several BigQuery tests as xfail.
github.com/ibis-project/ibis/pull/2375 I believe they started failing when
the google-cloud-bigquery library started using Arrow's to_pandas method
(PR: github.com/googleapis/google-cloud-python/pull/10027) instead of a
slower method that doesn't use Arrow.

These test failures are due to to_pandas returning different types than the
Ibis tests expect, such as numpy arrays in the case of lists (ibis#2370
<https://github.com/ibis-project/ibis/issues/2370>, ibis#2372
<https://github.com/ibis-project/ibis/issues/2372>, ibis#2374
<https://github.com/ibis-project/ibis/issues/2374>), NaN values for NULL
integers (ibis#2371 <https://github.com/ibis-project/ibis/issues/2371>),
and an unimplemented conversion for structs containing lists (ibis#2373
<https://github.com/ibis-project/ibis/issues/2373>).

I'd like to figure out what the next steps should be. Options:



   -

   Get BigQuery to output the currently expected Python objects in Ibis,
   -

   Change Ibis to expect more Arrow-aligned types for complex types, or
   -

   Update the Ibis tests to accept either Python objects or the output of
   Arrow's to_pandas method.


Thanks for your help,

*  •  **Tim Swast*
*  •  *Senior Software Friendliness Engineer, Data & Analytics
*  •  *Google Cloud Developer Relations
*  •  *Chicago, IL, USA

Reply via email to