jp0317 commented on code in PR #39153:
URL: https://github.com/apache/arrow/pull/39153#discussion_r1421819720
##########
cpp/src/parquet/file_reader.cc:
##########
@@ -61,6 +61,34 @@ static constexpr uint32_t kFooterSize = 8;
// For PARQUET-816
static constexpr int64_t kMaxDictHeaderSize = 100;
+bool IsColumnChunkFullyDictionaryEncoded(const ColumnChunkMetaData& col) {
Review Comment:
done, thanks!
##########
cpp/src/parquet/file_reader.h:
##########
@@ -80,6 +81,18 @@ class PARQUET_EXPORT RowGroupReader {
std::shared_ptr<ColumnReader> ColumnWithExposeEncoding(
int i, ExposedEncoding encoding_to_expose);
+ // Construct a RecordReader, trying to enable exposed encoding.
+ //
+ // For dictionary encoding, currently we only support column chunks that are
+ // fully dictionary encoded byte arrays. The caller can verify if the reader
can read
+ // and expose the dictionary by checking the reader's read_dictionary(). If
a column
+ // chunk uses dictionary encoding but then falls back to plain encoding, the
returned
+ // reader will read decoded data without exposing the dictionary.
Review Comment:
if it falls back the read_dictionary() will return a normal reader without
reading dictionary, I reword the comment to state that the caller should verify
the reader using read_dictionary()
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]