IMPALA-6076: Parquet BIT_PACKED deprecation warning Every 100th time that we open a Parquet column with the deprecated BIT_PACKED encoding, an error is logged. We do this per-column instead of per-file because Impala historically listed the BIT_PACKED encoding in file metadata even when it wasn't used for any columns - see IMPALA-5636.
Testing: Manually tested by running a query repeatedly against a BIT_PACKED sample file (which I created for my IMPALA-4177 patch). Ran "tail -f logs/cluster/impalad.WARNING" and checked that the warning was logged periodically. Change-Id: I02dd4009089a264b28376492b1b40361d767d5d9 Reviewed-on: http://gerrit.cloudera.org:8080/8370 Reviewed-by: Lars Volker <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/c87ad363 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/c87ad363 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/c87ad363 Branch: refs/heads/master Commit: c87ad3631a4f3f1854759937ae0f8de63cb6e5dc Parents: 1640aa9 Author: Tim Armstrong <[email protected]> Authored: Tue Oct 24 10:31:24 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Tue Oct 24 22:11:39 2017 +0000 ---------------------------------------------------------------------- be/src/exec/parquet-column-readers.cc | 7 +++++++ 1 file changed, 7 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/c87ad363/be/src/exec/parquet-column-readers.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/parquet-column-readers.cc b/be/src/exec/parquet-column-readers.cc index ad12916..6d211a6 100644 --- a/be/src/exec/parquet-column-readers.cc +++ b/be/src/exec/parquet-column-readers.cc @@ -46,6 +46,9 @@ DEFINE_bool(convert_legacy_hive_parquet_utc_timestamps, false, "When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will " "be converted from UTC to local time. Writes are unaffected."); +// Throttle deprecation warnings to - only print warning with this frequency. +static const int BITPACKED_DEPRECATION_WARNING_FREQUENCY = 100; + // Max data page header size in bytes. This is an estimate and only needs to be an upper // bound. It is theoretically possible to have a page header of any size due to string // value statistics, but in practice we'll have trouble reading string values this large. @@ -100,6 +103,10 @@ Status ParquetLevelDecoder::Init(const string& filename, case parquet::Encoding::BIT_PACKED: num_bytes = BitUtil::Ceil(num_buffered_values, 8); bit_reader_.Reset(*data, num_bytes); + LOG_EVERY_N(WARNING, BITPACKED_DEPRECATION_WARNING_FREQUENCY) + << filename << " uses deprecated Parquet BIT_PACKED encoding for rep or def " + << "levels. This will be removed in the future - see IMPALA-6077. Warning " + << "every " << BITPACKED_DEPRECATION_WARNING_FREQUENCY << " occurrences."; break; default: { stringstream ss;
