emkornfield commented on a change in pull request #10627:
URL: https://github.com/apache/arrow/pull/10627#discussion_r664776263



##########
File path: cpp/src/parquet/reader_test.cc
##########
@@ -322,6 +323,27 @@ TEST(TestFileReaderAdHoc, NationDictTruncatedDataPage) {
   ASSERT_EQ(ss2.str(), ss.str());
 }
 
+TEST(TestFileReaderEncoding, DeltaBinaryPacked) {
+  // Parquet file(delta_binary_packed.parquet) is generated by parquet-mr 
version 1.10.0.
+  // There are 66 columns in total and their encoding type is 
DELTA_BINARY_PACKED. The
+  // data type of the first 65 columns is bigint and their bit width ranges 
from 0 to 64.
+  // The data type of the last column is int.
+  std::list<int> columns;
+
+  std::stringstream ss_values;
+  const char* file = "delta_binary_packed.parquet";
+  auto reader_props = default_reader_properties();
+  auto reader = ParquetFileReader::OpenFile(data_file(file), false, 
reader_props);
+  ParquetFilePrinter printer(reader.get());
+  printer.DebugPrint(ss_values, columns, true, false, false, file);
+
+  std::ifstream in(data_file("delta_binary_packed.parquet.expect"));

Review comment:
       this test seems very brittle.  using CSV or JSON or some other format 
and comparing the column data seems directly instead of the string 
representation seems like a better idea?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to