jayzhan211 commented on code in PR #9679:
URL: https://github.com/apache/arrow-datafusion/pull/9679#discussion_r1530469275


##########
datafusion/sqllogictest/test_files/dictionary.slt:
##########
@@ -280,3 +280,70 @@ ORDER BY
 2023-12-20T01:20:00 1000 f2 foo
 2023-12-20T01:30:00 1000 f1 32.0
 2023-12-20T01:30:00 1000 f2 foo
+
+# Cleanup
+statement error DataFusion error: Execution error: Table 'm1' doesn't exist\.
+drop table m1;
+
+statement error DataFusion error: Execution error: Table 'm2' doesn't exist\.
+drop table m2;
+
+######
+# Create a table using UNION ALL to get 2 partitions (very important)
+######
+statement ok
+create table m3_source as
+    select * from (values('foo', 'bar', 1))
+    UNION ALL
+    select * from (values('foo', 'baz', 1));
+
+######
+# Now, create a table with the same data, but column2 has type 
`Dictionary(Int32)` to trigger the fallback code

Review Comment:
   I'm curious about how and where the DictionarayArray has been built. It is 
quite hard to trace the previous caller of 
`GroupedHashAggregateStream::poll_next` with RUST_BACKTRACE.
   
   
https://github.com/apache/arrow-datafusion/blob/b0b329ba39403b9e87156d6f9b8c5464dc6d2480/datafusion/physical-plan/src/aggregates/row_hash.rs#L434
   ```
   batch: RecordBatch { schema: Schema { fields: [Field { name: "column3", 
data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: 
{} }, Field { name: "COUNT(DISTINCT m3.column1)[count distinct]", data_type: 
List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }), nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "COUNT(DISTINCT 
m3.column2)[count distinct]", data_type: List(Field { name: "item", data_type: 
Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, 
metadata: {} }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: 
{} }], metadata: {} }, columns: [PrimitiveArray<Int64>
   [
     1,
     1,
   ], ListArray
   [
     StringArray
   [
     "foo",
   ],
     StringArray
   [
     "foo",
   ],
   ], ListArray
   [
     DictionaryArray {keys: PrimitiveArray<Int32>
   [
     0,
   ] values: StringArray
   [
     "bar",
     "baz",
   ]}
   ,
     DictionaryArray {keys: PrimitiveArray<Int32>
   [
     1,
   ] values: StringArray
   [
     "bar",
     "baz",
   ]}
   ,
   ]], row_count: 2 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to