luis4a0 commented on code in PR #12107:
URL: https://github.com/apache/gluten/pull/12107#discussion_r3260487761


##########
cpp/velox/shuffle/VeloxHashShuffleWriter.cc:
##########
@@ -1315,6 +1316,47 @@ uint64_t 
VeloxHashShuffleWriter::valueBufferSizeForFixedWidthArray(uint32_t fixe
   return valueBufferSize;
 }
 
+void VeloxHashShuffleWriter::accumulateInputEncodingCounts(const 
ColumnarBatch& cb) {
+  // Only velox-typed batches expose per-child encoding; foreign batches
+  // (e.g. arrow round-trips coming from non-velox sources) will be flattened
+  // by `VeloxColumnarBatch::from` later and we'd undercount, so just skip
+  // them here rather than reporting a misleading "all flat" mix.
+  if (cb.getType() != "velox") {
+    return;
+  }
+  const auto* veloxBatch = dynamic_cast<const VeloxColumnarBatch*>(&cb);
+  if (veloxBatch == nullptr) {
+    return;
+  }
+  const auto& rowVector = veloxBatch->getRowVector();
+  if (rowVector == nullptr) {
+    return;
+  }
+  for (const auto& child : rowVector->children()) {
+    if (child == nullptr) {
+      ++inputEncodingCounts_[kInputEncodingOther];
+      continue;
+    }
+    switch (child->encoding()) {
+      case facebook::velox::VectorEncoding::Simple::FLAT:
+        ++inputEncodingCounts_[kInputEncodingFlat];
+        break;
+      case facebook::velox::VectorEncoding::Simple::DICTIONARY:
+        ++inputEncodingCounts_[kInputEncodingDictionary];
+        break;
+      case facebook::velox::VectorEncoding::Simple::CONSTANT:
+        ++inputEncodingCounts_[kInputEncodingConstant];
+        break;
+      case facebook::velox::VectorEncoding::Simple::LAZY:
+        ++inputEncodingCounts_[kInputEncodingLazy];
+        break;
+      default:
+        ++inputEncodingCounts_[kInputEncodingOther];

Review Comment:
   Good catch, fixed in a208a78. Added a new `kInputEncodingComplex` bucket so 
ROW / MAP / FLAT_MAP / ARRAY no longer get conflated with the rare-encoding 
catch-all. `kInputEncodingOther` now only covers BIASED / SEQUENCE / FUNCTION 
(and any future additions to `VectorEncoding::Simple`). New `complex` gtest 
case exercises ARRAY + MAP children landing in the new bucket; existing cases 
also assert `kInputEncodingComplex == 0` so the boundary is checked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to