Rich-T-kid commented on code in PR #21589:
URL: https://github.com/apache/datafusion/pull/21589#discussion_r3096567439


##########
datafusion/physical-plan/src/aggregates/group_values/mod.rs:
##########
@@ -207,6 +258,19 @@ pub fn new_group_values(
             Ok(Box::new(GroupValuesColumn::<true>::try_new(schema)?))
         }
     } else {
+        // TODO:  add specialized implementation for dictionary encoding 
columns for 2+ group by columns case
         Ok(Box::new(GroupValuesRows::try_new(schema)?))
     }
 }
+
+fn supported_single_dictionary_value(t: &DataType) -> bool {
+    matches!(
+        t,
+        DataType::Utf8
+            | DataType::LargeUtf8
+            | DataType::Binary
+            | DataType::LargeBinary
+            | DataType::Utf8View
+            | DataType::BinaryView
+    )
+}

Review Comment:
   This PR is intended to serve as a proof of concept that dictionary encoding 
works and improves efficiency. It makes sense to start with string types since 
that is what dictionary arrays are most commonly used with and where they 
provide the most benefit. A follow up issue can be created to support 
additional types such as **LargeUtf8**, **LargeList**, and **numeric values** 
,which should be as minimal as adding the relevant branches in `get_raw_bytes` 
and `sentinel_repr`. This is also related to how sentinel representations of 
types work but I think thats worth a separate comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to