IMPALA-3528: Transfer scratch tuple memory in Close() of Parquet scanner. The lifetime of a scanner thread is decoupled from that of row batches that it produces. That means that all resources associated with row batches produced by the scanner thread should be transferred to those batches.
The bug was that we were not transferring the ownership of memory from the scratch batch to the final row batch returned in HdfsParquetScanner::Close(). Triggering an event that would cause the freed memory to be dereferenced is possible, but very difficult. My understanding is that it is only possible in exceptional non-deterministic scenarios, e.g., a query is cancelled just at the right time, or the scanner hits a parse/decoding error. Testing: I tested this change locally by running the scanner and nested types test as well as TPCH, nested TPCH, and TPC-DS. Change-Id: Ic34d32c9a41ea66b2b2d8f5e187cc84d4cb569b2 Reviewed-on: http://gerrit.cloudera.org:8080/3041 Reviewed-by: Alex Behm <[email protected]> Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/e96b4635 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/e96b4635 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/e96b4635 Branch: refs/heads/master Commit: e96b463587cf5e7e5211741bcbc70237d1b1b2a6 Parents: 7e0cbaf Author: Alex Behm <[email protected]> Authored: Wed May 11 18:22:41 2016 -0700 Committer: Tim Armstrong <[email protected]> Committed: Thu May 12 23:06:36 2016 -0700 ---------------------------------------------------------------------- be/src/exec/hdfs-parquet-scanner.cc | 2 ++ 1 file changed, 2 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/e96b4635/be/src/exec/hdfs-parquet-scanner.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/hdfs-parquet-scanner.cc b/be/src/exec/hdfs-parquet-scanner.cc index 47e19a6..3fcb693 100644 --- a/be/src/exec/hdfs-parquet-scanner.cc +++ b/be/src/exec/hdfs-parquet-scanner.cc @@ -1156,10 +1156,12 @@ void HdfsParquetScanner::Close() { } if (batch_ != NULL) { AttachPool(dictionary_pool_.get(), false); + AttachPool(scratch_batch_->mem_pool(), false); AddFinalRowBatch(); } // Verify all resources (if any) have been transferred. DCHECK_EQ(dictionary_pool_.get()->total_allocated_bytes(), 0); + DCHECK_EQ(scratch_batch_->mem_pool()->total_allocated_bytes(), 0); DCHECK_EQ(context_->num_completed_io_buffers(), 0); // If this was a metadata only read (i.e. count(*)), there are no columns. if (compression_types.empty()) compression_types.push_back(THdfsCompression::NONE);
