Quanlong Huang created IMPALA-10788:
---------------------------------------
Summary: Statestore Scalability document should mention
statestore_subscriber_timeout_secs and statestore_heartbeat_tcp_timeout_seconds
Key: IMPALA-10788
URL: https://issues.apache.org/jira/browse/IMPALA-10788
Project: IMPALA
Issue Type: Documentation
Reporter: Quanlong Huang
Assignee: Quanlong Huang
The current document about statestore scalability is
[http://impala.apache.org/docs/build/html/topics/impala_scalability.html#ariaid-title3]
We should add statestore_heartbeat_tcp_timeout_seconds and
statestore_subscriber_timeout_secs to the doc. They also impact the heartbeat
timeout detection. Incorrect settings may result in false positive liveness
dection and queries failed by "Cancelled due to unreachable impalad(s)" error.
Current document:
---------------------------------------------------
h3. Scalability Considerations for the Impala Statestore
Before Impala 2.1, the statestore sent only one kind of message to its
subscribers. This message contained all updates for any topics that a
subscriber had subscribed to. It also served to let subscribers know that the
statestore had not failed, and conversely the statestore used the success of
sending a heartbeat to a subscriber to decide whether or not the subscriber had
failed.
Combining topic updates and failure detection in a single message led to
bottlenecks in clusters with large numbers of tables, partitions, and HDFS data
blocks. When the statestore was overloaded with metadata updates to transmit,
heartbeat messages were sent less frequently, sometimes causing subscribers to
time out their connection with the statestore. Increasing the subscriber
timeout and decreasing the frequency of statestore heartbeats worked around the
problem, but reduced responsiveness when the statestore failed or restarted.
As of Impala 2.1, the statestore now sends topic updates and heartbeats in
separate messages. This allows the statestore to send and receive a steady
stream of lightweight heartbeats, and removes the requirement to send topic
updates according to a fixed schedule, reducing statestore network overhead.
The statestore now has the following relevant configuration flags for the
statestored daemon:
{{-statestore_num_update_threads}}
The number of threads inside the statestore dedicated to sending topic
updates. You should not typically need to change this value.
*Default:* 10
{{-statestore_update_frequency_ms}}
The frequency, in milliseconds, with which the statestore tries to send topic
updates to each subscriber. This is a best-effort value; if the statestore is
unable to meet this frequency, it sends topic updates as fast as it can. You
should not typically need to change this value.
*Default:* 2000
{{-statestore_num_heartbeat_threads}}
The number of threads inside the statestore dedicated to sending heartbeats.
You should not typically need to change this value.
*Default:* 10
{{-statestore_heartbeat_frequency_ms}}
The frequency, in milliseconds, with which the statestore tries to send
heartbeats to each subscriber. This value should be good for large catalogs and
clusters up to approximately 150 nodes. Beyond that, you might need to increase
this value to make the interval longer between heartbeat messages.
*Default:* 1000 (one heartbeat message every second)
If it takes a very long time for a cluster to start up, and impala-shell
consistently displays {{This Impala daemon is not ready to accept user
requests}}, the statestore might be taking too long to send the entire catalog
topic to the cluster. In this case, consider adding
{{--load_catalog_in_background=false}} to your catalog service configuration.
This setting stops the statestore from loading the entire catalog into memory
at cluster startup. Instead, metadata for each table is loaded when the table
is accessed for the first time.
-----------------------------------------------------
*We need to add these:*
-----------------------------------------------------
{{-statestore_heartbeat_tcp_timeout_seconds}}
The time after which a heartbeat RPC to a subscriber will timeout. This
setting protects against badly hung machines that are not able to respond to
the heartbeat RPC in short order. Increase this if there are intermittent
heartbeat RPC timeouts shown in statestore's log. You can reference the max
value of "statestore.priority-topic-update-durations" metric on statestore to
get a reasonable value. Note that priority topic updates are assumed to be
small amounts of data that take a small amount of time to process (similar to
the heartbeat complexity).
*Default:* 3
{{-statestore_max_missed_heartbeats}}
Maximum number of consecutive heartbeat messages an impalad can miss before
being declared failed by the statestore. You should not typically need to
change this value.
*Default:* 10
{{-statestore_subscriber_timeout_secs}}
The amount of time (in seconds) that may elapse before the connection with the
statestore is considered lost. This should be comparable to
{{(statestore_heartbeat_frequency_ms +
statestore_heartbeat_tcp_timeout_seconds) * statestore_max_missed_heartbeats}},
so subscribers won't reregister themselves too early and allow statestore to
resend heartbeats. You can also reference the max value of
"statestore-subscriber.heartbeat-interval-time" metrics on impalads to get a
reasonable value.
*Default:* 30
-----------------------------------------------------
--
This message was sent by Atlassian Jira
(v8.3.4#803005)