This is an automated email from the ASF dual-hosted git repository.

awong pushed a commit to branch branch-1.9.x
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/branch-1.9.x by this push:
     new 16f8fc8  [known_issues] the scalability of location awareness
16f8fc8 is described below

commit 16f8fc8cccd8bc962eb731d8ae8a0baa0d1369cd
Author: Alexey Serbin <[email protected]>
AuthorDate: Fri Mar 8 15:13:05 2019 -0800

    [known_issues] the scalability of location awareness
    
    Added information about poor scalability of the location
    awareness implementation in 1.9.0 in terms of number of
    concurrent clients connecting to cluster.
    
    Change-Id: I04dad488a377bf4cd36534d648a69d2fb2444fea
    Reviewed-on: http://gerrit.cloudera.org:8080/12706
    Tested-by: Kudu Jenkins
    Reviewed-by: Adar Dembo <[email protected]>
    Reviewed-by: Andrew Wong <[email protected]>
---
 docs/known_issues.adoc | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc
index 83890be..05ac621 100644
--- a/docs/known_issues.adoc
+++ b/docs/known_issues.adoc
@@ -104,8 +104,6 @@
 
 == Cluster management
 
-* Rack awareness is not supported.
-
 * Multi-datacenter is not supported.
 
 * Rolling restart is not supported.
@@ -142,6 +140,24 @@
 * Maximum number of tablets per table for each tablet server is 60,
   post-replication (assuming the default replication factor of 3), at 
table-creation time.
 
+* When enabled, location awareness in its current implementation doesn't scale
+  with the number of clients connecting to a Kudu cluster simultaneously.
+  If the rate of new clients connecting is kept high (e.g., 100 request/second)
+  for a long period of time or there is a short period of time when a huge
+  number of such requests arrive to Kudu masters simultaneously (e.g. 10000
+  requests arrive within one second), Kudu masters might experience RPC queue
+  overflows and overall slowness. The slowness becomes more prominent with
+  the increasing size of the master process in memory, where the major
+  contributing factor is the total number of tablet replicas ever created in
+  the cluster. Eventually, the issue may manifest as write and scan operations
+  timing out. If that happens, it's recommended to use the following 
workaround:
+** Disable assignment of locations to clients, adding `--enable_unsafe_flags`
+   and `--master_client_location_assignment_enabled=false` to the list of
+   runtime flags for Kudu masters. This retains the benefits of location
+   awareness for initial placement of tablet replicas and re-replication, but
+   clients will not be able to use location information to choose
+   the closest tablet server for scan operations.
+
 == Replication and Backup Limitations
 
 * Kudu does not currently include any built-in features for backup and restore.

Reply via email to