mapleFU commented on issue #34722:
URL: https://github.com/apache/arrow/issues/34722#issuecomment-1482995057
The interface:
```c++
// Abstract page iterator interface. This way, we can feed column pages to
the
// ColumnReader through whatever mechanism we choose
class PARQUET_EXPORT PageReader {
using DataPageFilter = std::function<bool(const DataPageStats&)>;
public:
virtual ~PageReader() = default;
// @returns: shared_ptr<Page>(nullptr) on EOS, std::shared_ptr<Page>
// containing new Page otherwise
virtual std::shared_ptr<Page> NextPage() = 0;
```
The actual:
```c++
// This subclass delimits pages appearing in a serialized stream, each
preceded
// by a serialized Thrift format::PageHeader indicating the type of each page
// and the page metadata.
class SerializedPageReader : public PageReader {
public:
SerializedPageReader(std::shared_ptr<ArrowInputStream> stream, int64_t
total_num_values,
Compression::type codec, const ReaderProperties&
properties,
const CryptoContext* crypto_ctx, bool
always_compressed)
: properties_(properties),
stream_(std::move(stream)),
decompression_buffer_(AllocateBuffer(properties_.memory_pool(), 0)),
decryption_buffer_(AllocateBuffer(properties_.memory_pool(), 0)) {}
```
It (implicitly) depend on `SerializedPageReader`'s buffer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]