Re: [PR] GH-39560: [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT [arrow]

via GitHub Thu, 11 Jan 2024 10:06:12 -0800


mapleFU commented on code in PR #39570:
URL: https://github.com/apache/arrow/pull/39570#discussion_r1449224979



##########
cpp/src/parquet/reader_test.cc:
##########
@@ -120,11 +120,27 @@ std::string concatenated_gzip_members() {
   return data_file("concatenated_gzip_members.parquet");
 }
 
+std::string byte_stream_split() { return 
data_file("byte_stream_split.zstd.parquet"); }
+
+template <typename DType, typename ValueType = typename DType::c_type>
+std::vector<ValueType> ReadColumnValues(ParquetFileReader* file_reader, int 
row_group,
+                                        int column, int64_t 
expected_values_read) {
+  auto column_reader = checked_pointer_cast<TypedColumnReader<DType>>(
+      file_reader->RowGroup(row_group)->Column(column));
+  std::vector<ValueType> values(expected_values_read);
+  int64_t values_read;
+  auto levels_read = column_reader->ReadBatch(expected_values_read, nullptr, 
nullptr,

Review Comment:
   the schema is :
   
   ```
   required group field_id=-1 schema {
     optional float field_id=-1 f32;
     optional double field_id=-1 f64;
   }
   ```
   
   So the data has def-levels. And this read without `def-levels` force it read 
300 values without querying levels. I don't know if this is expected. Should we 
make batchSize larger, and read with def-levels to check it only has 300 
values? Or this is exactly what we need? (Just read 300 values here)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-39560: [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT [arrow]

Reply via email to