mapleFU commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1229611425
##########
cpp/src/parquet/arrow/reader.cc:
##########
@@ -462,7 +463,8 @@ class LeafReader : public ColumnReaderImpl {
input_(std::move(input)),
descr_(input_->descr()) {
record_reader_ = RecordReader::Make(
Review Comment:
Yes, in fact reading from parquet is a bit dirty and hacking. Some user just
uses api like only `ColumnWriter` and `ColumnReader` to handle low-level api,
only know the record count and binary as "raw records"
Some users uses `parquet::arrow` reader, this will assemble "arrow message"
and assemble "raw records" to arrow format.
Maybe we should add test like:
```c++
void CheckReadValues(int start, int end) {
auto binary_reader =
dynamic_cast<BinaryRecordReader*>(record_reader_.get());
ASSERT_NE(binary_reader, nullptr);
// Chunks are reset after this call.
::arrow::ArrayVector array_vector = binary_reader->GetBuilderChunks();
ASSERT_EQ(array_vector.size(), 1);
auto binary_array =
dynamic_cast<::arrow::FixedSizeBinaryArray*>(array_vector[0].get());
ASSERT_NE(binary_array, nullptr);
ASSERT_EQ(binary_array->length(), record_reader_->values_written());
if (read_dense_for_nullable()) {
ASSERT_EQ(binary_array->null_count(), 0);
ASSERT_EQ(record_reader_->null_count(), 0);
} else {
ASSERT_EQ(binary_array->null_count(), record_reader_->null_count());
}
std::vector<std::string_view> expected = expected_values(start, end);
for (size_t i = 0; i < expected.size(); ++i) {
if (def_levels_[i + start] == 0) {
ASSERT_EQ(!read_dense_for_nullable(), binary_array->IsNull(i));
} else {
ASSERT_EQ(expected[i].compare(binary_array->GetView(i)), 0);
ASSERT_FALSE(binary_array->IsNull(i));
}
}
}
```
And ensure that it returns the "Large" type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]