wgtmac commented on code in PR #14556:
URL: https://github.com/apache/arrow/pull/14556#discussion_r1014841622
##########
cpp/src/parquet/printer.cc:
##########
@@ -39,6 +39,25 @@ namespace parquet {
class ColumnReader;
+namespace {
+
+void PrintPageEncodingStats(std::ostream& stream,
+ const std::vector<PageEncodingStats>&
encoding_stats) {
+ for (size_t i = 0; i < encoding_stats.size(); ++i) {
+ const auto& encoding = encoding_stats.at(i);
+ stream << EncodingToString(encoding.encoding);
+ if (encoding.page_type == parquet::PageType::DICTIONARY_PAGE) {
+ // Explicitly tell if this encoding comes from a dictionary page
+ stream << "(DICT_PAGE)";
Review Comment:
The main idea is to tell this encoding comes from the dictionary page. IIUC,
both dictionary page and data page use PLAIN_DICTIONARY when dictionary
encoding is applied in the Parquet 1.0. While in Parquet 2.0, dictionary page
uses PLAIN and data page uses RLE_DICTIONARY. So it is difficult to tell where
the PLAIN_DICTIONARY or PLAIN encoding comes from. Please check this for
detail:
https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]