[
https://issues.apache.org/jira/browse/IMPALA-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543899#comment-16543899
]
Pooja Nilangekar commented on IMPALA-6153:
------------------------------------------
I looked at the lifecycle of the various objects involved. Currently,
Coordinator::UpdateFilter doesn't use the filter_mem_tracker_ if the query's
resources have already been released. Specifically, this line
[https://github.com/apache/impala/blob/3da2dc63fe0c716998cdbdf6334036fbc7698714/be/src/runtime/coordinator.cc#L814]
ensures that it doesn't apply the update if the state is disabled. One thing
that could be done to make this cleaner is to make it fail early, i.e., check
the exec_state_ at the beginning and find the filter in the
filter_routing_table_ only if the query is still executing.
After UpdateFilter releases the lock here
[https://github.com/apache/impala/blob/3da2dc63fe0c716998cdbdf6334036fbc7698714/be/src/runtime/coordinator.cc#L854]
UpdateFilter doesn't access any memory which belongs to the filter. This is
because it would have already copied the contents of the filter into rpc_params
allocated on its stack. Also, after it releases the lock, the UpdateFilter only
publishes the filters to all the backend_states_ and IMPALA-6144 ensures that
the PublishFilter function fails early if the backend IsDone. (We could add
another check before publishing the updates but I think that would end up
affecting the parallelism across filters)
So I think it would make sense to simply fail early if an update corresponds to
a query which is no longer executing. Does this approach sound reasonable?
> Prevent Coordinator::UpdateFilter() running after query exec resources are
> released
> -----------------------------------------------------------------------------------
>
> Key: IMPALA-6153
> URL: https://issues.apache.org/jira/browse/IMPALA-6153
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Sailesh Mukil
> Assignee: Pooja Nilangekar
> Priority: Major
> Labels: query-lifecycle, runtime-filters
>
> Coordinator::UpdateFilter() and CoordinatorBackendState::PublishFilter() run
> independent of the lifecycle of any fragment instance. This is problematic
> during query teardown.
> Specifically we should not release resources for a query if any one of those
> above functions are still running for that query and we also should not not
> start running the above methods after resources are released for the query.
> Also, the 'rpc_params' in UpdateFilter() could potentially hold large amounts
> of untracked memory, so we should track it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]