[
https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019784#comment-17019784
]
ASF subversion and git services commented on IMPALA-9154:
---------------------------------------------------------
Commit 79aae231443a305ce8503dbc7b4335e8ae3f3946 in impala's branch
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=79aae23 ]
IMPALA-9154: Make runtime filter propagation asynchronous
This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.
Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.
This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.
Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
Impala minicluster started with the parameter
--impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.
Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Reviewed-on: http://gerrit.cloudera.org:8080/14975
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> KRPC DataStreamService threads blocked in PublishFilter
> -------------------------------------------------------
>
> Key: IMPALA-9154
> URL: https://issues.apache.org/jira/browse/IMPALA-9154
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 3.4.0
> Reporter: Tim Armstrong
> Assignee: Fang-Yu Rao
> Priority: Blocker
> Labels: hang
> Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt
>
>
> I hit this on primitive_many_fragments when doing a single node perf run:
> {noformat}
> ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja
> --workloads=targeted-perf --iterations=5
> {noformat}tan
> I noticed that the query was hung and the execution threads were hung sending
> row batches. Then looking at the RPCz page, all of the threads were busy:
> !image-2019-11-13-08-30-27-178.png!
> Multiple threads were stuck in UpdateFilter() - see [^pstack-exchange.txt].
> It looks like this is a deadlock bug because a KRPC thread is blocked waiting
> for an RPC that needs to be served by one of the limited threads from that
> same thread pool
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]