Repository: incubator-impala
Updated Branches:
  refs/heads/master 15e6cf8fd -> b2dbcbc2d


IMPALA-5636: Change the metadata in parquet

When writing in parquet format, Impala does not use repetition level.
But the repetition level encoding is set to BIT_PACKED, which is deprecated
and may cause problems when read by other softwares.
Changing it to RLE solves this issue.

Testing: This change is only manually tested.
To test with default testdata loaded:
> create table default.test like tpch_parquet.orders stored as parquet;
> insert into default.random values (0,0,"",0,"","","",0,"");
Then fetch "hdfs://localhost:20500/test-warehouse/test/*.parq" and use
$ java -jar parquet-tools-1.6.0.jar dump /home/tianyi/Downloads/*.parq | grep 
RLE:
to inspect the file. Before the change you would see output like
    page 0:              DLE:RLE RLE:BIT_PACKED VLE:PLA [more]... VC:1
and after the change they should be
    page 0:              DLE:RLE RLE:RLE VLE:PLA [more]... VC:1

Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Reviewed-on: http://gerrit.cloudera.org:8080/7514
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/b2dbcbc2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/b2dbcbc2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/b2dbcbc2

Branch: refs/heads/master
Commit: b2dbcbc2d1bb7d57c5f50989ad25eec1783e52b2
Parents: 15e6cf8
Author: Tianyi Wang <[email protected]>
Authored: Wed Jul 26 16:31:03 2017 -0700
Committer: Tim Armstrong <[email protected]>
Committed: Mon Jul 31 17:03:01 2017 +0000

----------------------------------------------------------------------
 be/src/exec/hdfs-parquet-table-writer.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b2dbcbc2/be/src/exec/hdfs-parquet-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-parquet-table-writer.cc 
b/be/src/exec/hdfs-parquet-table-writer.cc
index 04a81f1..237dd83 100644
--- a/be/src/exec/hdfs-parquet-table-writer.cc
+++ b/be/src/exec/hdfs-parquet-table-writer.cc
@@ -169,7 +169,6 @@ class HdfsParquetTableWriter::BaseColumnWriter {
     data_encoding_stats_.clear();
     // Repetition/definition level encodings are constant. Incorporate them 
here.
     column_encodings_.insert(Encoding::RLE);
-    column_encodings_.insert(Encoding::BIT_PACKED);
   }
 
   // Close this writer. This is only called after Flush() and no more rows will
@@ -738,7 +737,7 @@ void HdfsParquetTableWriter::BaseColumnWriter::NewPage() {
     // relies on these specific values for the definition/repetition level
     // encodings.
     header.definition_level_encoding = Encoding::RLE;
-    header.repetition_level_encoding = Encoding::BIT_PACKED;
+    header.repetition_level_encoding = Encoding::RLE;
     current_page_->header.__set_data_page_header(header);
   }
   current_encoding_ = next_page_encoding_;

Reply via email to