[
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amogh Margoor updated IMPALA-10811:
-----------------------------------
Description:
Initial RPC to submit a query and fetch the query handle can take quite long
time to return due to expensive Catalog Operations like Rename, Alter Table
Recover partition on tables with many partitions. Attached is the profile of
one such DDL query.
These RPCs are:
1. Beeswax:
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
2. HS2:
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
One of the side effects of such RPC taking long time is that clients such as
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and
closes connections after 350s and cannot be configured. But after closing the
connection it doesn;t send TCP RST to the client. Only when client tries to
send data or packets NLB issues back TCP RST to indicate connection is not
alive. Documentation is here:
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
Hence, we may need to evaluate techniques for RPCs to return query handle after
# Creating Driver,
# Register Query
([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])
and execute later parts of RPC asynchronously in different thread without
blocking the RPC. That way clients can get query handle and poll for it for
state and results.
was:
Initial RPC to submit a query and fetch the query handle can take quite long
time to return due to expensive Catalog Operations like Rename, Alter Table
Recover partition on tables with many partitions. Attached is the profile of
one such DDL query.
These RPCs are:
1. Beeswax:
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
2. HS2:
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
One of the side effects of such RPC taking long time is that clients such as
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and
closes connections after 350s and cannot be configured. But after closing the
connection it doesn;t send TCP RST to the client. Only when client tries to
send data or packets NLB issues back TCP RST to indicate connection is not
alive. Documentation is here:
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
Hence, we may need to evaluate techniques for RPCs to return query handle
sooner right after the Query Registration () and execute later parts of RPC
asynchronously so that clients can get query handle and poll for it for results.
> RPC to submit query getting stuck for AWS NLB forever.
> ------------------------------------------------------
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
> Issue Type: Bug
> Reporter: Amogh Margoor
> Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long
> time to return due to expensive Catalog Operations like Rename, Alter Table
> Recover partition on tables with many partitions. Attached is the profile of
> one such DDL query.
> These RPCs are:
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>
> One of the side effects of such RPC taking long time is that clients such as
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks
> and closes connections after 350s and cannot be configured. But after closing
> the connection it doesn;t send TCP RST to the client. Only when client tries
> to send data or packets NLB issues back TCP RST to indicate connection is not
> alive. Documentation is here:
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
> Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle
> after
> # Creating Driver,
> # Register Query
> ([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])
> and execute later parts of RPC asynchronously in different thread without
> blocking the RPC. That way clients can get query handle and poll for it for
> state and results.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]