giacomorebecchi opened a new issue, #8512:
URL: https://github.com/apache/arrow-datafusion/issues/8512

   ### Describe the bug
   
   In version 33.0.0, I encountered the following bug (not present in version 
32.0.0):
   Executing an aggregation with operator ARRAY_AGG() of a column of type list, 
ORDER BY a column of type non-list, returns the following error:
   `Execution error: Expects values arguments and/or ordering_values arguments 
to have same size`
   
   ### To Reproduce
   
   I have an MRE in python:
   `pip install "pyarrow==14.0.0" "datafusion==33.0.0"`
   
   ```python
   import datetime
   import random
   
   import datafusion
   import pyarrow as pa
   import pyarrow.dataset as pda
   
   N_ROWS = 10_000
   N_CARDS = 1_000
   N_PRODUCTS = 50
   
   ta = pa.Table.from_pydict(
       {
           "Card.Id": random.choices([str(i) for i in range(N_CARDS)], 
k=N_ROWS),
           "Date": (datetime.date(2023, (i % 12) + 1, (i % 28) + 1) for i in 
range(N_ROWS)),
           "Product.Ids": [random.choices([i for i in range(N_PRODUCTS)], k=2) 
for i in range(N_ROWS)]
       }
   )
   
   query = """
   SELECT
       "Card.Id"
       , FIRST_VALUE("Product.Ids" ORDER BY "Date")
       , LAST_VALUE("Product.Ids" ORDER BY "Date")
       , ARRAY_AGG("Product.Ids" ORDER BY "Date")
   FROM "table"
   GROUP BY "Card.Id"
   """
   
   ctx = datafusion.SessionContext()
   ctx.register_dataset(name="table",
                        dataset=pda.dataset(ta))
   df = ctx.sql(query)
   compute_ta = pa.Table.from_batches(df.collect())
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to