jp0317 commented on code in PR #39153:
URL: https://github.com/apache/arrow/pull/39153#discussion_r1430772245
##########
cpp/src/parquet/column_reader.cc:
##########
@@ -1369,6 +1369,26 @@ class TypedRecordReader : public
TypedColumnReaderImpl<DType>,
return bytes_for_values;
}
+ const void* ReadDictionary(int32_t* dictionary_length) override {
+ if (this->current_decoder_ == nullptr && !this->HasNextInternal()) {
+ dictionary_length = 0;
+ return nullptr;
+ }
+ // Verify the current data page is dictionary encoded. The
current_encoding_ should
+ // have been set as RLE_DICTIONARY if the page encoding is RLE_DICTIONARY
or
+ // PLAIN_DICTIONARY.
+ if (this->current_encoding_ != Encoding::RLE_DICTIONARY) {
+ std::stringstream ss;
+ ss << "Data page is not dictionary encoded. Encoding: "
Review Comment:
maybe it's better to leave this as a user choice (they can still choose to
print the descr using existing api)? I don't have a strong preference but let
me know if you prefer always printing it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]