[jira] [Updated] (IMPALA-10812) [DOCS] RPC to submit query getting stuck for AWS NLB forever.

Amogh Margoor (Jira) Tue, 20 Jul 2021 07:47:04 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Amogh Margoor updated IMPALA-10812:
-----------------------------------
    Description: 
We would need to document the behaviour of IMPALA-10811 as a limitation with 
AWS NLB. Problem description:

 

Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence clients like impala-shell waiting for RPC to return gets stuck 
indefinitely.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 


> [DOCS] RPC to submit query getting stuck for AWS NLB forever.
> -------------------------------------------------------------
>
>                 Key: IMPALA-10812
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10812
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Amogh Margoor
>            Priority: Major
>
> We would need to document the behaviour of IMPALA-10811 as a limitation with 
> AWS NLB. Problem description:
>  
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence clients like impala-shell waiting for RPC to return gets stuck 
> indefinitely.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-10812) [DOCS] RPC to submit query getting stuck for AWS NLB forever.

Reply via email to