pitrou commented on code in PR #39570:
URL: https://github.com/apache/arrow/pull/39570#discussion_r1449237601
##########
cpp/src/parquet/reader_test.cc:
##########
@@ -120,11 +120,27 @@ std::string concatenated_gzip_members() {
return data_file("concatenated_gzip_members.parquet");
}
+std::string byte_stream_split() { return
data_file("byte_stream_split.zstd.parquet"); }
+
+template <typename DType, typename ValueType = typename DType::c_type>
+std::vector<ValueType> ReadColumnValues(ParquetFileReader* file_reader, int
row_group,
+ int column, int64_t
expected_values_read) {
+ auto column_reader = checked_pointer_cast<TypedColumnReader<DType>>(
+ file_reader->RowGroup(row_group)->Column(column));
+ std::vector<ValueType> values(expected_values_read);
+ int64_t values_read;
+ auto levels_read = column_reader->ReadBatch(expected_values_read, nullptr,
nullptr,
Review Comment:
I see. Well, it may have definition levels but it certainly has zero nulls,
given how it was generated :-)
https://github.com/apache/parquet-testing/blob/master/data/README.md#byte-stream-split
If I generate the data directly, I get the same values too:
```python
>>>
...: np.random.seed(0)
...: table = pa.Table.from_pydict({
...: 'f32': np.random.normal(size=300).astype(np.float32),
...: 'f64': np.random.normal(size=300).astype(np.float64),
...: })
>>> table
pyarrow.Table
f32: float
f64: double
----
f32:
[[1.7640524,0.4001572,0.978738,2.2408931,1.867558,...,1.1368914,0.09772497,0.5829537,-0.39944902,0.37005588]]
f64:
[[-1.3065268517353166,1.658130679618188,-0.11816404512856976,-0.6801782039968504,0.6663830820319143,...,0.37923553353558676,-0.4700328827008748,-0.21673147057553863,-0.9301565025243212,-0.17858909208732915]]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]