giacomorebecchi opened a new issue, #8512:
URL: https://github.com/apache/arrow-datafusion/issues/8512
### Describe the bug
In version 33.0.0, I encountered the following bug (not present in version
32.0.0):
Executing an aggregation with operator ARRAY_AGG() of a column of type list,
ORDER BY a column of type non-list, returns the following error:
`Execution error: Expects values arguments and/or ordering_values arguments
to have same size`
### To Reproduce
I have an MRE in python:
`pip install "pyarrow==14.0.0" "datafusion==33.0.0"`
```python
import datetime
import random
import datafusion
import pyarrow as pa
import pyarrow.dataset as pda
N_ROWS = 10_000
N_CARDS = 1_000
N_PRODUCTS = 50
ta = pa.Table.from_pydict(
{
"Card.Id": random.choices([str(i) for i in range(N_CARDS)],
k=N_ROWS),
"Date": (datetime.date(2023, (i % 12) + 1, (i % 28) + 1) for i in
range(N_ROWS)),
"Product.Ids": [random.choices([i for i in range(N_PRODUCTS)], k=2)
for i in range(N_ROWS)]
}
)
query = """
SELECT
"Card.Id"
, FIRST_VALUE("Product.Ids" ORDER BY "Date")
, LAST_VALUE("Product.Ids" ORDER BY "Date")
, ARRAY_AGG("Product.Ids" ORDER BY "Date")
FROM "table"
GROUP BY "Card.Id"
"""
ctx = datafusion.SessionContext()
ctx.register_dataset(name="table",
dataset=pda.dataset(ta))
df = ctx.sql(query)
compute_ta = pa.Table.from_batches(df.collect())
```
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]