Michael Ho has posted comments on this change.

Change subject: IMPALA-2399: Check for mem limit in allocations in parquet 
scanner and decompressor
......................................................................


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/2203/12/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1569:     RETURN_IF_ERROR(state_->CheckQueryState());
> What is this check needed for?  Is it just for UDF errors? 
Replaced with GetQueryStatus(). If we are to introduce GetUDFStatus(), we 
probably should do it in another patch. I think it would be easier if we focus 
on getting rid of SetMemLimitExceeded(). Once that's removed, query_status_ 
will be solely used by UDF.


http://gerrit.cloudera.org:8080/#/c/2203/12/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 88:   // most format specific scanners expect batch_ != NULL in their 
Close() functions.
> but this assumes StartNewRowBatch() will always set a batch_ even if return
That's something I considered but it seems to me those DCHECKs are good checks 
for the invariant when there is no failure so converting them to if-stmt may be 
missing coverage. I added a debug only bool which is set to true when there is 
no initialization error and DCHECK(batch_ != NULL || !scanner_inited_);


Line 196:   // TODO: move this to ProcessSplit() / ProcessRange().
> This check eventually will be needed only for UDF status, right?  So wouldn
If we have to keep it together with FreeLocalAllocations(), it may not be 
appropriate to move it out to ProcessSplit() / ProcessRange() as we may 
accumulate too much local allocations. We shouldn't move this check out of  
CommitRows() after all.

One small thing is that the existing code treats UDF error not as parse_status_ 
so it will return from ProcessRange() right away when query status is not ok. 
On the other hand, the code may cripple on for parse errors. So, I took Skye's 
suggestion to set parse_status_ for the return vale of StartNewRowBatch(). Not 
sure if we want to reconsider this behavior in the long run ?


http://gerrit.cloudera.org:8080/#/c/2203/12/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

Line 290:   // cases where we use Status::CANCELLED to indicate that the limit 
was reached.
> I don't think we want to do this TODO, right? how about we delete it?
Done


Line 293:   if (UNLIKELY(!query_status_.ok())) return GetQueryStatus();
> should this if-stmt go in GetQueryStatus() so that direct callers also bene
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70400407b7662999332448f4d1bce2cc344ca89
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to