lidavidm commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r927983874


##########
cpp/src/parquet/column_reader.h:
##########
@@ -105,10 +105,12 @@ class PARQUET_EXPORT PageReader {
 
   static std::unique_ptr<PageReader> Open(
       std::shared_ptr<ArrowInputStream> stream, int64_t total_num_rows,
-      Compression::type codec, ::arrow::MemoryPool* pool = 
::arrow::default_memory_pool(),
+      Compression::type codec, bool compression_always_true,

Review Comment:
   nit: default to false?



##########
cpp/src/parquet/column_io_benchmark.cc:
##########
@@ -130,7 +130,8 @@ std::shared_ptr<Int64Reader> 
BuildReader(std::shared_ptr<Buffer>& buffer,
                                          int64_t num_values, Compression::type 
codec,
                                          ColumnDescriptor* schema) {
   auto source = std::make_shared<::arrow::io::BufferReader>(buffer);
-  std::unique_ptr<PageReader> page_reader = PageReader::Open(source, 
num_values, codec);
+  std::unique_ptr<PageReader> page_reader =
+      PageReader::Open(source, num_values, codec, false);

Review Comment:
   nit: add `/*param_name=*/ false` so readers can more easily tell what's 
going on



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, 
WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   This also needs a SKIP like the one for Snappy above



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, 
WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   FWIW, everything else in `parquet` uses PARQUET_TEST_DATA…should it have 
gone there instead?



##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -85,7 +85,7 @@ class TestPrimitiveWriter : public 
PrimitiveTypedTest<TestType> {
     ASSERT_OK_AND_ASSIGN(auto buffer, sink_->Finish());
     auto source = std::make_shared<::arrow::io::BufferReader>(buffer);
     std::unique_ptr<PageReader> page_reader =
-        PageReader::Open(std::move(source), num_rows, compression);
+        PageReader::Open(std::move(source), num_rows, compression, false);

Review Comment:
   ditto the comment above here (though: adding the default would also fix it)



##########
cpp/src/parquet/column_reader.cc:
##########
@@ -449,7 +452,10 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() {
           header.repetition_levels_byte_length < 0) {
         throw ParquetException("Invalid page header (negative levels byte 
length)");
       }
-      bool is_compressed = header.__isset.is_compressed ? header.is_compressed 
: false;
+      // Some implementations set is_compressed to false but still compressed.

Review Comment:
   "Some implementations" -> Specifically, Arrow prior to 3.0.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to