IMPALA-4444: Transfer row group resources to row batch on scan failure Previously, if any column reader fails in HdfsParqetScanner::AssembleRows(), the memory pools associated with the ScratchTupleBatch will be freed. This is problematic as ScratchTupleBatch may contain memory pools which are still referenced by row batches shipped upstream. This is possible because memory pools used by parquet column readers (e.g. decompressor_pool_) won't be transferred to a ScratchTupleBatch until the data page is exhausted. So, the memory pools of the previous data page is always attached to the ScratchTupleBatch of the current data page. On a scan failure, it's not necessarily safe to free the memory pool attached to the current ScratchTupleBatch.
This patch fixes the problem above by transferring the memory pool and other resources associated with a row group to the current row batch in the parquet scanner on scan failure so it can eventually be freed by upstream operators as the row batch is consumed. Change-Id: Id70df470e98dd96284fd176bfbb946e9637ad126 Reviewed-on: http://gerrit.cloudera.org:8080/5052 Reviewed-by: Michael Ho <[email protected]> Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/38ee3b69 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/38ee3b69 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/38ee3b69 Branch: refs/heads/master Commit: 38ee3b6942f3e044555ee399c51cba65f0b35267 Parents: 6937fa9 Author: Michael Ho <[email protected]> Authored: Thu Nov 10 13:33:46 2016 -0800 Committer: Internal Jenkins <[email protected]> Committed: Tue Nov 15 23:02:50 2016 +0000 ---------------------------------------------------------------------- be/src/exec/exec-node.cc | 2 +- be/src/exec/hdfs-parquet-scanner.cc | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/38ee3b69/be/src/exec/exec-node.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/exec-node.cc b/be/src/exec/exec-node.cc index 255217f..1d93626 100644 --- a/be/src/exec/exec-node.cc +++ b/be/src/exec/exec-node.cc @@ -429,7 +429,7 @@ Status ExecNode::ExecDebugAction(TExecNodePhase::type phase, RuntimeState* state return Status::OK(); } if (debug_action_ == TDebugAction::MEM_LIMIT_EXCEEDED) { - mem_tracker()->MemLimitExceeded(state, "Debug Action: MEM_LIMIT_EXCEEDED"); + return mem_tracker()->MemLimitExceeded(state, "Debug Action: MEM_LIMIT_EXCEEDED"); } return Status::OK(); } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/38ee3b69/be/src/exec/hdfs-parquet-scanner.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/hdfs-parquet-scanner.cc b/be/src/exec/hdfs-parquet-scanner.cc index bd5d65c..6b157aa 100644 --- a/be/src/exec/hdfs-parquet-scanner.cc +++ b/be/src/exec/hdfs-parquet-scanner.cc @@ -537,8 +537,9 @@ Status HdfsParquetScanner::AssembleRows( bool num_tuples_mismatch = c != 0 && last_num_tuples != scratch_batch_->num_tuples; if (UNLIKELY(!continue_execution || num_tuples_mismatch)) { // Skipping this row group. Free up all the resources with this row group. - scratch_batch_->mem_pool()->FreeAll(); + FlushRowGroupResources(row_batch); scratch_batch_->num_tuples = 0; + DCHECK(scratch_batch_->AtEnd()); *skip_row_group = true; if (num_tuples_mismatch) { parse_status_.MergeStatus(Substitute("Corrupt Parquet file '$0': column '$1' "
