Repository: kudu Updated Branches: refs/heads/master b552d9118 -> 953a09b82
[docs] Add "one client only" best practice for kudu-spark Change-Id: Ibaf369315b8627674ba64e6418d153568ded6fe8 Reviewed-on: http://gerrit.cloudera.org:8080/11409 Tested-by: Will Berkeley <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e3570519 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e3570519 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e3570519 Branch: refs/heads/master Commit: e3570519b200a0ffbd713798bc8aabd6f36ed3b7 Parents: b552d91 Author: Will Berkeley <[email protected]> Authored: Mon Sep 10 10:45:30 2018 -0700 Committer: Will Berkeley <[email protected]> Committed: Mon Sep 10 18:43:43 2018 +0000 ---------------------------------------------------------------------- docs/developing.adoc | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/e3570519/docs/developing.adoc ---------------------------------------------------------------------- diff --git a/docs/developing.adoc b/docs/developing.adoc index 98db2ba..49d8c7e 100644 --- a/docs/developing.adoc +++ b/docs/developing.adoc @@ -217,6 +217,23 @@ mode, the submitting user must have an active Kerberos ticket granted through name and keytab location must be provided through the `--principal` and `--keytab` arguments to `spark2-submit`. +=== Spark Integration Best Practices + +==== Avoid multiple Kudu clients per cluster. + +One common Kudu-Spark coding error is instantiating extra `KuduClient` objects. +In kudu-spark, a `KuduClient` is owned by the `KuduContext`. Spark application code +should not create another `KuduClient` connecting to the same cluster. Instead, +application code should use the `KuduContext` to access a `KuduClient` using +`KuduContext#syncClient`. + +To diagnose multiple `KuduClient` instances in a Spark job, look for signs in +the logs of the master being overloaded by many `GetTableLocations` or +`GetTabletLocations` requests coming from different clients, usually around the +same time. This symptom is especially likely in Spark Streaming code, +where creating a `KuduClient` per task will result in periodic waves of master +requests from new clients. + === Spark Integration Known Issues and Limitations - Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration
