Michael Ho has posted comments on this change. Change subject: IMPALA-2399: Check for mem limit in allocations in parquet scanner and decompressor ......................................................................
Patch Set 10: (6 comments) http://gerrit.cloudera.org:8080/#/c/2203/10/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1681: if (UNLIKELY(!parse_status_.ok())) continue_execution = false; > It drops the status because specific non-OK query statuses are handled in P CommitRows() can now induce error as it may have exceeded the MemLimit when allocating a new row batch so it's reporting both query status and scan induced error. Dropping it seems wrong to me now. In addition, ProcessSplit() checks parse_status_ only so I don't see how query_status is being propagated to ProcessSplit(). In addition, the query_status could be set by UDF deep down in the stack during conjunct evaluation so it may not be correct to say the query error status is not induced by the scan. I guess the way UDF is propagating error is causing all these mixed interpretation of query status. Once we can make UDF return Status directly, we can get rid of all these confusing polling behavior. What's the expected frequency which we need to poll the query status ? Why do we do it in CommitRow() instead of say AssembleRows() or ProcessSplit() ? http://gerrit.cloudera.org:8080/#/c/2203/10/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 85: // batch_ != NULL so Close() can be called for proper cleanup in case of failure. > I don't follow this comment -- where does Close() care about batch_? The dependency is in the format specific scanner's Close() (e.g. HdfsParquetScanner::Close()). The reasoning is that this generic function is only called via the format specific scanner's Prepare() so in case say HdfsParquetScanner::Prepare() fails, we need to call HdfsParquetScanner::Close() to clean up for symmetry. We could consider cleaning things up in HdfsParquetScanner::Prepare() if the generic Scanner::Prepare() function fails. However, we still need to do the clean up in CreateAndPrepareScanner() if the failure happens inside HdfsParquetScanner::Prepare() instead so it's cleaner to have only one clean up path. I will clarify the comment further. Line 114: obj_pool_.Clear(); > where's the dependency on batch_? Please see replies above. http://gerrit.cloudera.org:8080/#/c/2203/10/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: Line 301: failed_allocation > how about failed_alloc_size to make it clearer that this is a length. Done Line 305: return status; > this is fine, but the only thing that does any real work is RuntimeState cl I thought about that and the only problem is that we may not have RuntimeState object available all the time (e.g. decompress.cc) so putting it in MemTracker() makes this function applicable in more locations as all exec nodes have mem trackers and most if not all call sites of Allocate() have mem trackers available. It's possible to use a static function of RuntimeState but it seems like a mismatch when RuntimeState object is actually available. http://gerrit.cloudera.org:8080/#/c/2203/10/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 244: logging > logging to the error log Done -- To view, visit http://gerrit.cloudera.org:8080/2203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic70400407b7662999332448f4d1bce2cc344ca89 Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
