ASF subversion and git services commented on IMPALA-6997:

Commit 8e5c18c3b789da8208611e77cd25899be78d4c8e in impala's branch 
refs/heads/2.x from [~joemcdonnell]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=8e5c18c ]

IMPALA-6997: Avoid redundant dumping in SetMemLimitExceeded()

When a UDF hits a MemLimitExceeded, the query does not
immediately abort. Instead, UDFs rely on the caller
checking the query_status_ periodically. This means that
on some codepaths, UDFs can call SetMemLimitExceeded()
many times (e.g. once per row) before the query fragment

RuntimeState::SetMemLimitExceeded() currently constructs
a MemLimitExceeded Status and dumps it for each call, even
if the query has already hit an error. This is expensive
and can delay an fragment from exiting when UDFs are
repeatedly hitting MemLimitExceeded.

This changes SetMemLimitExceeded() to avoid dumping if
the query_status_ is already not ok.

Change-Id: I92b87f370a68a2f695ebbc2520a98dd143730701
Reviewed-on: http://gerrit.cloudera.org:8080/10364
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

> Query execution should notice UDF MemLimitExceeded errors more quickly
> ----------------------------------------------------------------------
>                 Key: IMPALA-6997
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6997
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.13.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
> When a UDF hits a memory limit, it calls RuntimeState::SetMemLimitExceeded() 
> which sets the query status, but it has no way of returning status directly. 
> It relies on the caller checking status periodically.
> HdfsTableSink::Send() checks for errors by calling 
> RuntimeState::CheckQueryState() once at the beginning. If it is evaluating a 
> UDF and that UDF hits the memory limit, it will need to process the whole 
> RowBatch before it aborts the query. This could be 1024 rows and each row may 
> hit a memory limit in that UDF. Other locations that process UDFs may be 
> processing considerably more rows.
> There are two general approaches:
>  # Code locations should check for status more frequently and thus abort 
> faster after a RuntimeState::SetMemLImitExceeded() call.
>  # RuntimeState::SetMemLimitExceeded() should be substantially cheaper, 
> allowing the rows to be processed faster.
> RuntimeState::SetMemLimitExceeded() currently calls 
> MemTracker::MemLimitExceeded() unconditionally. It then checks to see if it 
> should update query_status_ (i.e. query_status_ is currently ok). Then it 
> logs this error. This is wasteful, because MemTracker::MemLimitExceeded() is 
> not a cheap function, and this is flooding the log for each row. 
> RuntimeState::SetMemLimitExceeded() should check status before running 
> MemTracker::MemoryLimitExceeded(). If query_status_ is already not ok, it can 
> avoid the cost of the dump and logging.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to