[jira] [Resolved] (IMPALA-10788) Statestore Scalability document should mention statestore_subscriber_timeout_secs and statestore_heartbeat_tcp_timeout_seconds

Quanlong Huang (Jira) Thu, 22 Jul 2021 18:42:06 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Quanlong Huang resolved IMPALA-10788.
-------------------------------------
    Fix Version/s: Impala 4.1
       Resolution: Fixed

> Statestore Scalability document should mention 
> statestore_subscriber_timeout_secs and 
> statestore_heartbeat_tcp_timeout_seconds
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10788
>             Project: IMPALA
>          Issue Type: Documentation
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.1
>
>
> The current document about statestore scalability is 
> [http://impala.apache.org/docs/build/html/topics/impala_scalability.html#ariaid-title3]
> We should add statestore_heartbeat_tcp_timeout_seconds and 
> statestore_subscriber_timeout_secs to the doc. They also impact the heartbeat 
> timeout detection. Incorrect settings may result in false positive liveness 
> dection and queries failed by "Cancelled due to unreachable impalad(s)" error.
> Current document:
>  ---------------------------------------------------
> h3. Scalability Considerations for the Impala Statestore
> Before Impala 2.1, the statestore sent only one kind of message to its 
> subscribers. This message contained all updates for any topics that a 
> subscriber had subscribed to. It also served to let subscribers know that the 
> statestore had not failed, and conversely the statestore used the success of 
> sending a heartbeat to a subscriber to decide whether or not the subscriber 
> had failed.
> Combining topic updates and failure detection in a single message led to 
> bottlenecks in clusters with large numbers of tables, partitions, and HDFS 
> data blocks. When the statestore was overloaded with metadata updates to 
> transmit, heartbeat messages were sent less frequently, sometimes causing 
> subscribers to time out their connection with the statestore. Increasing the 
> subscriber timeout and decreasing the frequency of statestore heartbeats 
> worked around the problem, but reduced responsiveness when the statestore 
> failed or restarted.
> As of Impala 2.1, the statestore now sends topic updates and heartbeats in 
> separate messages. This allows the statestore to send and receive a steady 
> stream of lightweight heartbeats, and removes the requirement to send topic 
> updates according to a fixed schedule, reducing statestore network overhead.
> The statestore now has the following relevant configuration flags for the 
> statestored daemon:
> {{-statestore_num_update_threads}} 
>  The number of threads inside the statestore dedicated to sending topic 
> updates. You should not typically need to change this value.
>  *Default:* 10
> {{-statestore_update_frequency_ms}} 
>  The frequency, in milliseconds, with which the statestore tries to send 
> topic updates to each subscriber. This is a best-effort value; if the 
> statestore is unable to meet this frequency, it sends topic updates as fast 
> as it can. You should not typically need to change this value.
>  *Default:* 2000
> {{-statestore_num_heartbeat_threads}} 
>  The number of threads inside the statestore dedicated to sending heartbeats. 
> You should not typically need to change this value.
>  *Default:* 10
> {{-statestore_heartbeat_frequency_ms}} 
>  The frequency, in milliseconds, with which the statestore tries to send 
> heartbeats to each subscriber. This value should be good for large catalogs 
> and clusters up to approximately 150 nodes. Beyond that, you might need to 
> increase this value to make the interval longer between heartbeat messages.
>  *Default:* 1000 (one heartbeat message every second)
> If it takes a very long time for a cluster to start up, and impala-shell 
> consistently displays {{This Impala daemon is not ready to accept user 
> requests}}, the statestore might be taking too long to send the entire 
> catalog topic to the cluster. In this case, consider adding 
> {{--load_catalog_in_background=false}} to your catalog service configuration. 
> This setting stops the statestore from loading the entire catalog into memory 
> at cluster startup. Instead, metadata for each table is loaded when the table 
> is accessed for the first time.
> -----------------------------------------------------
>  *We need to add these:*
>  -----------------------------------------------------
>  {{-statestore_heartbeat_tcp_timeout_seconds}}
>  The time after which a heartbeat RPC to a subscriber will timeout. This 
> setting protects against badly hung machines that are not able to respond to 
> the heartbeat RPC in short order. Increase this if there are intermittent 
> heartbeat RPC timeouts shown in statestore's log. You can reference the max 
> value of "statestore.priority-topic-update-durations" metric on statestore to 
> get a reasonable value. Note that priority topic updates are assumed to be 
> small amounts of data that take a small amount of time to process (similar to 
> the heartbeat complexity).
>  *Default:* 3
> {{-statestore_max_missed_heartbeats}}
>  Maximum number of consecutive heartbeat messages an impalad can miss before 
> being declared failed by the statestore. You should not typically need to 
> change this value.
>  *Default:* 10
> {{-statestore_subscriber_timeout_secs}}
>  The amount of time (in seconds) that may elapse before the connection with 
> the statestore is considered lost. This should be comparable to 
> {{(statestore_heartbeat_frequency_ms + 
> statestore_heartbeat_tcp_timeout_seconds) * 
> statestore_max_missed_heartbeats}}, so subscribers won't reregister 
> themselves too early and allow statestore to resend heartbeats. You can also 
> reference the max value of "statestore-subscriber.heartbeat-interval-time" 
> metrics on impalads to get a reasonable value.
>  *Default:* 30
>  -----------------------------------------------------



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (IMPALA-10788) Statestore Scalability document should mention statestore_subscriber_timeout_secs and statestore_heartbeat_tcp_timeout_seconds

Reply via email to