Re: [PR] GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parquet DB support on s390x [arrow]

via GitHub Wed, 03 Jun 2026 14:34:12 -0700


Copilot commented on code in PR #48200:
URL: https://github.com/apache/arrow/pull/48200#discussion_r3352060491



##########
cpp/src/parquet/types_test.cc:
##########
@@ -77,18 +77,17 @@ TEST(TypePrinter, StatisticsTypes) {
   std::string smax;
   int32_t int_min = 1024;
   int32_t int_max = 2048;
-  smin = std::string(reinterpret_cast<char*>(&int_min), sizeof(int32_t));
-  smax = std::string(reinterpret_cast<char*>(&int_max), sizeof(int32_t));
-  ASSERT_STREQ("1024", FormatStatValue(Type::INT32, smin).c_str());
-  ASSERT_STREQ("2048", FormatStatValue(Type::INT32, smax).c_str());
+  int32_t int_min_le = ::arrow::bit_util::ToLittleEndian(int_min);
+  int32_t int_max_le = ::arrow::bit_util::ToLittleEndian(int_max);
+  smin = std::string(reinterpret_cast<char*>(&int_min_le), sizeof(int32_t));

Review Comment:
   The INT32/INT64 sections build little-endian encoded min/max strings but no 
longer assert the formatted output, so these code paths are currently untested 
(and the values are immediately overwritten). Please restore the assertions so 
this test continues validating numeric statistic formatting on all platforms.



##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1608,16 +1608,17 @@ void TestByteStreamSplitEncoding<Type>::CheckDecode() {
     // INT32, FLOAT
     const std::vector<uint8_t> data{0x11, 0x22, 0x33, 0x44, 0x55, 0x66,
                                     0x77, 0x88, 0x99, 0xAA, 0xBB, 0xCC};
-    const auto expected_output =
-        ToLittleEndian<uint32_t>({0xAA774411U, 0xBB885522U, 0xCC996633U});
-    CheckDecode(std::span{data}, std::span{expected_output});
+    // Values are in native byte order after decoding
+    const std::vector<uint32_t> expected_output{0xAA774411U, 0xBB885522U, 
0xCC996633U};
+    CheckDecode(span{data}, span{expected_output});

Review Comment:
   `span{...}` is not in scope here (the file includes `<span>` but does not 
introduce an unqualified `span` symbol). This will fail to compile; use 
`std::span{...}` like the other calls in this test.



##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -1273,11 +1274,16 @@ void EncodeLevels(Encoding::type encoding, int16_t 
max_level, int num_levels,
   // encode levels
   if (encoding == Encoding::RLE) {
     // leave space to write the rle length value
+#if ARROW_LITTLE_ENDIAN
     encoder.Init(encoding, max_level, num_levels, bytes.data() + 
sizeof(int32_t),
                  static_cast<int>(bytes.size()));
-
+#else
+    encoder.Init(encoding, max_level, num_levels, bytes.data() + 
sizeof(int32_t),

Review Comment:
   `LevelEncoder::Init`'s `data_size` parameter should reflect the remaining 
buffer size after the length prefix. This code currently uses `bytes.size()` on 
little-endian but `bytes.size() - sizeof(int32_t)` on big-endian, which makes 
behavior platform-dependent and risks the encoder writing past the end of the 
`bytes` vector. Pass `bytes.size() - sizeof(int32_t)` unconditionally (matching 
other call sites like parquet/column_writer.cc).



##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1608,16 +1608,17 @@ void TestByteStreamSplitEncoding<Type>::CheckDecode() {
     // INT32, FLOAT
     const std::vector<uint8_t> data{0x11, 0x22, 0x33, 0x44, 0x55, 0x66,
                                     0x77, 0x88, 0x99, 0xAA, 0xBB, 0xCC};
-    const auto expected_output =
-        ToLittleEndian<uint32_t>({0xAA774411U, 0xBB885522U, 0xCC996633U});
-    CheckDecode(std::span{data}, std::span{expected_output});
+    // Values are in native byte order after decoding
+    const std::vector<uint32_t> expected_output{0xAA774411U, 0xBB885522U, 
0xCC996633U};
+    CheckDecode(span{data}, span{expected_output});
   } else {
     // INT64, DOUBLE
     const std::vector<uint8_t> data{0xDE, 0xC0, 0x37, 0x13, 0x11, 0x22, 0x33, 
0x44,
                                     0xAA, 0xBB, 0xCC, 0xDD, 0x55, 0x66, 0x77, 
0x88};
-    const auto expected_output =
-        ToLittleEndian<uint64_t>({0x7755CCAA331137DEULL, 
0x8866DDBB442213C0ULL});
-    CheckDecode(std::span{data}, std::span{expected_output});
+    // Values are in native byte order after decoding
+    const std::vector<uint64_t> expected_output{0x7755CCAA331137DEULL,
+                                                0x8866DDBB442213C0ULL};
+    CheckDecode(span{data}, span{expected_output});

Review Comment:
   `span{...}` is not in scope here (only `std::span` is available). Use 
`std::span{...}` to avoid a compile error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parquet DB support on s390x [arrow]

Reply via email to