Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/108#discussion_r10417262
--- Diff: docs/monitoring.md ---
@@ -48,11 +48,22 @@ Each instance can report to zero or more _sinks_. Sinks
are contained in the
* `ConsoleSink`: Logs metrics information to the console.
* `CSVSink`: Exports metrics data to CSV files at regular intervals.
-* `GangliaSink`: Sends metrics to a Ganglia node or multicast group.
* `JmxSink`: Registers metrics for viewing in a JXM console.
* `MetricsServlet`: Adds a servlet within the existing Spark UI to serve
metrics data as JSON data.
* `GraphiteSink`: Sends metrics to a Graphite node.
+Spark also supports a Ganglia sink which is not included in the default
build due to
+licensing restrictions:
+
+* `GangliaSink`: Sends metrics to a Ganglia node or multicast group.
+
+To install the `GangliaSink` you'll need to perform a custom build of
Spark. _**Note that
+by embedding this library you will include
[LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed
+code in your Spark package**_. For sbt users, set the
+`SPARK_GANGLIA_LGPL` environment varaible before building. For Maven
users, enable
+the `-Pspark-ganglia-lgpl` profile. For users linking applications against
Spark, link
+include the `spark-ganglia-lgpl` artifact as a dependency.
--- End diff --
This is kind of confusing because it's not clear that you should *both*
build a custom Spark *and* have applications link to spark-ganglia-lgpl. It
made it sound like you do one of the Maven command, or the SBT one, or adding
that dependency. But in fact you need to deploy the special build to the
cluster and also link your app to this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---