wgtmac commented on code in PR #34416:
URL: https://github.com/apache/arrow/pull/34416#discussion_r1125476231


##########
cpp/src/arrow/adapters/orc/adapter_test.cc:
##########
@@ -840,6 +840,23 @@ TEST_F(TestORCWriterSingleArray, WriteListOfMap) {
   AssertArrayWriteReadEqual(array, array, kDefaultSmallMemStreamSize * 10);
 }
 
+TEST_F(TestORCWriterSingleArray, WriteSparseUnion) {
+  const int64_t num_rows = 1024;
+  auto type =
+      sparse_union({field("_union_0", utf8()), field("_union_1", int32())}, 
{0, 1});
+  auto array = checked_pointer_cast<SparseUnionArray>(rand.ArrayOf(type, 
num_rows, 0.4));
+  ArrayVector children;
+  for (int i = 0; i < array->num_fields(); ++i) {
+    ASSERT_OK_AND_ASSIGN(auto flattened_child, array->GetFlattenedField(i));
+    children.emplace_back(std::move(flattened_child));
+  }
+  auto flattened_array = std::make_shared<SparseUnionArray>(
+      array->type(), array->length(), std::move(children), array->type_codes(),
+      array->offset());

Review Comment:
   The random array generator fill random unselected values in the child arrays 
of `SparseUnionArray`. However, orc file only contains dense union type meaning 
that these unselected values will not be written to the file (so we can never 
read them back and compare equality in the unit test). In this PR, I fill 
unselected values to nulls when reading from the file. So flattening the 
`SparseUnionArray` before writing makes it easy for the roundtrip equality 
check of `SparseUnionArray`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to