[ https://issues.apache.org/jira/browse/IMPALA-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476772#comment-16476772 ]
ASF subversion and git services commented on IMPALA-6997: --------------------------------------------------------- Commit caf275c11a62c33d0211e71f3285c4977dd6799d in impala's branch refs/heads/master from [~joemcdonnell] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=caf275c ] IMPALA-6997: Avoid redundant dumping in SetMemLimitExceeded() When a UDF hits a MemLimitExceeded, the query does not immediately abort. Instead, UDFs rely on the caller checking the query_status_ periodically. This means that on some codepaths, UDFs can call SetMemLimitExceeded() many times (e.g. once per row) before the query fragment exits. RuntimeState::SetMemLimitExceeded() currently constructs a MemLimitExceeded Status and dumps it for each call, even if the query has already hit an error. This is expensive and can delay an fragment from exiting when UDFs are repeatedly hitting MemLimitExceeded. This changes SetMemLimitExceeded() to avoid dumping if the query_status_ is already not ok. Change-Id: I92b87f370a68a2f695ebbc2520a98dd143730701 Reviewed-on: http://gerrit.cloudera.org:8080/10364 Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Query execution should notice UDF MemLimitExceeded errors more quickly > ---------------------------------------------------------------------- > > Key: IMPALA-6997 > URL: https://issues.apache.org/jira/browse/IMPALA-6997 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.13.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Major > > When a UDF hits a memory limit, it calls RuntimeState::SetMemLimitExceeded() > which sets the query status, but it has no way of returning status directly. > It relies on the caller checking status periodically. > HdfsTableSink::Send() checks for errors by calling > RuntimeState::CheckQueryState() once at the beginning. If it is evaluating a > UDF and that UDF hits the memory limit, it will need to process the whole > RowBatch before it aborts the query. This could be 1024 rows and each row may > hit a memory limit in that UDF. Other locations that process UDFs may be > processing considerably more rows. > There are two general approaches: > # Code locations should check for status more frequently and thus abort > faster after a RuntimeState::SetMemLImitExceeded() call. > # RuntimeState::SetMemLimitExceeded() should be substantially cheaper, > allowing the rows to be processed faster. > RuntimeState::SetMemLimitExceeded() currently calls > MemTracker::MemLimitExceeded() unconditionally. It then checks to see if it > should update query_status_ (i.e. query_status_ is currently ok). Then it > logs this error. This is wasteful, because MemTracker::MemLimitExceeded() is > not a cheap function, and this is flooding the log for each row. > RuntimeState::SetMemLimitExceeded() should check status before running > MemTracker::MemoryLimitExceeded(). If query_status_ is already not ok, it can > avoid the cost of the dump and logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org