[3/5] incubator-impala git commit: IMPALA-4444: Transfer row group resources to row batch on scan failure

jbapple Tue, 15 Nov 2016 20:35:23 -0800

IMPALA-4444: Transfer row group resources to row batch on scan failure

Previously, if any column reader fails in HdfsParqetScanner::AssembleRows(),
the memory pools associated with the ScratchTupleBatch will be freed. This
is problematic as ScratchTupleBatch may contain memory pools which are still
referenced by row batches shipped upstream. This is possible because memory
pools used by parquet column readers (e.g. decompressor_pool_) won't be
transferred to a ScratchTupleBatch until the data page is exhausted. So,
the memory pools of the previous data page is always attached to the
ScratchTupleBatch of the current data page. On a scan failure, it's not
necessarily safe to free the memory pool attached to the current 
ScratchTupleBatch.


This patch fixes the problem above by transferring the memory pool and other
resources associated with a row group to the current row batch in the parquet
scanner on scan failure so it can eventually be freed by upstream operators as
the row batch is consumed.

Change-Id: Id70df470e98dd96284fd176bfbb946e9637ad126
Reviewed-on: http://gerrit.cloudera.org:8080/5052
Reviewed-by: Michael Ho <[email protected]>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/38ee3b69
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/38ee3b69
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/38ee3b69

Branch: refs/heads/master
Commit: 38ee3b6942f3e044555ee399c51cba65f0b35267
Parents: 6937fa9
Author: Michael Ho <[email protected]>
Authored: Thu Nov 10 13:33:46 2016 -0800
Committer: Internal Jenkins <[email protected]>
Committed: Tue Nov 15 23:02:50 2016 +0000

----------------------------------------------------------------------
 be/src/exec/exec-node.cc            | 2 +-
 be/src/exec/hdfs-parquet-scanner.cc | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/38ee3b69/be/src/exec/exec-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/exec-node.cc b/be/src/exec/exec-node.cc
index 255217f..1d93626 100644
--- a/be/src/exec/exec-node.cc
+++ b/be/src/exec/exec-node.cc
@@ -429,7 +429,7 @@ Status ExecNode::ExecDebugAction(TExecNodePhase::type 
phase, RuntimeState* state
     return Status::OK();
   }
   if (debug_action_ == TDebugAction::MEM_LIMIT_EXCEEDED) {
-    mem_tracker()->MemLimitExceeded(state, "Debug Action: MEM_LIMIT_EXCEEDED");
+    return mem_tracker()->MemLimitExceeded(state, "Debug Action: 
MEM_LIMIT_EXCEEDED");
   }
   return Status::OK();
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/38ee3b69/be/src/exec/hdfs-parquet-scanner.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-parquet-scanner.cc 
b/be/src/exec/hdfs-parquet-scanner.cc
index bd5d65c..6b157aa 100644
--- a/be/src/exec/hdfs-parquet-scanner.cc
+++ b/be/src/exec/hdfs-parquet-scanner.cc
@@ -537,8 +537,9 @@ Status HdfsParquetScanner::AssembleRows(
       bool num_tuples_mismatch = c != 0 && last_num_tuples != 
scratch_batch_->num_tuples;
       if (UNLIKELY(!continue_execution || num_tuples_mismatch)) {
         // Skipping this row group. Free up all the resources with this row 
group.
-        scratch_batch_->mem_pool()->FreeAll();
+        FlushRowGroupResources(row_batch);
         scratch_batch_->num_tuples = 0;
+        DCHECK(scratch_batch_->AtEnd());
         *skip_row_group = true;
         if (num_tuples_mismatch) {
           parse_status_.MergeStatus(Substitute("Corrupt Parquet file '$0': 
column '$1' "

[3/5] incubator-impala git commit: IMPALA-4444: Transfer row group resources to row batch on scan failure

Reply via email to