Re: [PR] feat: Add `flatten` array function [arrow-datafusion-python]

via GitHub Sat, 10 Feb 2024 04:08:20 -0800


mobley-trent commented on PR #562:
URL: 
https://github.com/apache/arrow-datafusion-python/pull/562#issuecomment-1936990813


   Hey @ongchi I tested the `flatten` function and its failing. Here is the 
code :
   ```python
   from datafusion import SessionContext, column
   from datafusion import functions as f
   import numpy as np
   import pyarrow as pa
   
   
   def py_flatten(arr):
       # Testing helper function
       result = []
       for elem in arr:
           if isinstance(elem, list):
               result.extend(py_flatten(elem))
           else:
               result.append(elem)
       return result
   
   ctx = SessionContext()
   data = [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
   
   batch = pa.RecordBatch.from_arrays(
       [np.array(data, dtype=object)], names=["arr"]
   )
   df = ctx.create_dataframe([[batch]])
   col = column("arr")
   
   
   stmt = f.flatten(col)
   py_expr = lambda: [py_flatten(data)]
   
   result = df.select(stmt).collect()[0].column(0).tolist()
   
   print(f"flatten query: {result}")
   print(f"py_expr: {py_expr()}")
   ```
   
   Results:
   ```
   >>> flatten query: [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
   >>> py_expr: [[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]]
   ```
   
   I expected the flatten query to be identical to the `py_expr`. Is there 
something I overlooked ? Or is this an underlying bug ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add `flatten` array function [arrow-datafusion-python]

Reply via email to