IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl;
This fix removes the MemPool::FreeAll() call from HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush() is incorrect because a memory pool buffer is cached by the compressor in the table writer which isn't reset across calls to Flush(). If the file that is being written is big enough, HdfsSequenceTableWriter::AppendRows() will call Flush() multiple times causing memory corruption. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Reviewed-on: http://gerrit.cloudera.org:8080/7394 Reviewed-by: Michael Ho <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/bc56d3c4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/bc56d3c4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/bc56d3c4 Branch: refs/heads/master Commit: bc56d3c48c3629bda989e1f6b8265bd42c1b5c63 Parents: 3bd21bc Author: Attila Jeges <[email protected]> Authored: Fri Jun 16 16:37:03 2017 +0200 Committer: Impala Public Jenkins <[email protected]> Committed: Wed Jul 19 06:48:06 2017 +0000 ---------------------------------------------------------------------- be/src/exec/hdfs-sequence-table-writer.cc | 1 - .../queries/QueryTest/seq-writer.test | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bc56d3c4/be/src/exec/hdfs-sequence-table-writer.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/hdfs-sequence-table-writer.cc b/be/src/exec/hdfs-sequence-table-writer.cc index f8d7b4c..4a66c5e 100644 --- a/be/src/exec/hdfs-sequence-table-writer.cc +++ b/be/src/exec/hdfs-sequence-table-writer.cc @@ -348,7 +348,6 @@ Status HdfsSequenceTableWriter::Flush() { } out_.Clear(); out_value_lengths_block_.Clear(); - mem_pool_->FreeAll(); unflushed_rows_ = 0; return Status::OK(); } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bc56d3c4/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test ---------------------------------------------------------------------- diff --git a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test b/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test index 753eb0f..7e2363f 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test +++ b/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test @@ -288,3 +288,21 @@ select count(*) from store_sales_seq_gzip_block; ---- TYPES BIGINT ==== +---- QUERY +# IMPALA-5407: Create a table containing seq files with GZIP+RECORD. If the number of +# impalad workers is three, three files will be created, two of which are large enough +# (> 64MB) to force multiple flushes. Make sure that the files have been created +# successfully. +SET COMPRESSION_CODEC=GZIP; +SET SEQ_COMPRESSION_MODE=RECORD; +SET ALLOW_UNSUPPORTED_FORMATS=1; +create table catalog_sales_seq_gzip_rec like tpcds.catalog_sales stored as SEQUENCEFILE; +insert into catalog_sales_seq_gzip_rec select * from tpcds.catalog_sales; +==== +---- QUERY +select count(*) from catalog_sales_seq_gzip_rec; +---- RESULTS +1441548 +---- TYPES +BIGINT +====
