pitrou commented on code in PR #50224:
URL: https://github.com/apache/arrow/pull/50224#discussion_r3473298339


##########
cpp/src/arrow/compute/row/row_encoder_internal.h:
##########
@@ -252,6 +252,31 @@ struct VarLengthKeyEncoder : KeyEncoder {
   std::shared_ptr<DataType> type_;
 };
 
+// Encodes BinaryView/StringView keys in the same variable-length row format as
+// VarLengthKeyEncoder. The encoded row copies the key bytes, so Decode builds 
a
+// fresh view array rather than aliasing the input's variadic buffers.

Review Comment:
   Since the encode path is the same as for `VarLengthKeyEncoder` (only the 
decoding differs), can we factor out the encoding methods in a common base 
class?



##########
cpp/src/arrow/compute/row/grouper_test.cc:
##########
@@ -913,6 +927,42 @@ TEST(Grouper, StringKey) {
   }
 }
 
+TEST(Grouper, StringViewKey) {
+  // Mix inline (<=12 byte) and out-of-line view keys.
+  for (auto ty : {utf8_view(), binary_view()}) {
+    ARROW_SCOPED_TRACE("key type = ", *ty);
+    {
+      TestGrouper g({ty});
+      g.ExpectConsume(R"([["eh"], ["eh"]])", "[0, 0]");
+      g.ExpectConsume(R"([["eh"], ["eh"]])", "[0, 0]");
+      g.ExpectConsume(R"([["be"], [null]])", "[1, 2]");
+      g.ExpectConsume(R"([["a long out-of-line view"], ["a long out-of-line 
view"]])",
+                      "[3, 3]");
+      g.ExpectUniques(R"([["eh"], ["be"], [null], ["a long out-of-line 
view"]])");

Review Comment:
   Test with several different out-of-line views as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to