mobley-trent commented on PR #562:
URL:
https://github.com/apache/arrow-datafusion-python/pull/562#issuecomment-1936990813
Hey @ongchi I tested the `flatten` function and its failing. Here is the
code :
```python
from datafusion import SessionContext, column
from datafusion import functions as f
import numpy as np
import pyarrow as pa
def py_flatten(arr):
# Testing helper function
result = []
for elem in arr:
if isinstance(elem, list):
result.extend(py_flatten(elem))
else:
result.append(elem)
return result
ctx = SessionContext()
data = [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
batch = pa.RecordBatch.from_arrays(
[np.array(data, dtype=object)], names=["arr"]
)
df = ctx.create_dataframe([[batch]])
col = column("arr")
stmt = f.flatten(col)
py_expr = lambda: [py_flatten(data)]
result = df.select(stmt).collect()[0].column(0).tolist()
print(f"flatten query: {result}")
print(f"py_expr: {py_expr()}")
```
Results:
```
>>> flatten query: [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
>>> py_expr: [[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]]
```
I expected the flatten query to be identical to the `py_expr`. Is there
something I overlooked ? Or is this an underlying bug ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]