On 19 Mar 2016, at 16:16, Pete Robbins <robbin...@gmail.com<mailto:robbin...@gmail.com>> wrote:
There are several open Jiras to add new Sinks OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194 StatsD https://issues.apache.org/jira/browse/SPARK-11574 statsd is nicely easy to test: either listen in on a (localhost, port) or simply create a socket and force it into the sink for the test run Kafka https://issues.apache.org/jira/browse/SPARK-13392 Some have PRs from 2015 so I'm assuming there is not the desire to integrate these into core Spark. Opening up the Sink/Source interfaces would at least allow these to exist somewhere such as spark-packages without having to pollute the o.a.s namespace On Sat, 19 Mar 2016 at 13:05 Gerard Maas <gerard.m...@gmail.com<mailto:gerard.m...@gmail.com>> wrote: +1 On Mar 19, 2016 08:33, "Pete Robbins" <robbin...@gmail.com<mailto:robbin...@gmail.com>> wrote: This seems to me to be unnecessarily restrictive. These are very useful extension points for adding 3rd party sources and sinks. I intend to make an Elasticsearch sink available on spark-packages but this will require a single class, the sink, to be in the org.apache.spark package tree. I could submit the package as a PR to the Spark codebase, and I'd be happy to do that but it could be a completely separate add-on. There are similar issues with writing a 3rd party metrics source which may not be of interest to the community at large so would probably not warrant inclusion in the Spark codebase. Any thoughts?