On 19 Mar 2016, at 16:16, Pete Robbins 
<robbin...@gmail.com<mailto:robbin...@gmail.com>> wrote:


There are several open Jiras to add new Sinks

OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194
StatsD https://issues.apache.org/jira/browse/SPARK-11574


statsd is nicely easy to test: either listen in on a (localhost, port) or 
simply create a socket and force it into the sink for the test run


Kafka https://issues.apache.org/jira/browse/SPARK-13392

Some have PRs from 2015 so I'm assuming there is not the desire to integrate 
these into core Spark. Opening up the Sink/Source interfaces would at least 
allow these to exist somewhere such as spark-packages without having to pollute 
the o.a.s namespace


On Sat, 19 Mar 2016 at 13:05 Gerard Maas 
<gerard.m...@gmail.com<mailto:gerard.m...@gmail.com>> wrote:

+1

On Mar 19, 2016 08:33, "Pete Robbins" 
<robbin...@gmail.com<mailto:robbin...@gmail.com>> wrote:
This seems to me to be unnecessarily restrictive. These are very useful 
extension points for adding 3rd party sources and sinks.

I intend to make an Elasticsearch sink available on spark-packages but this 
will require a single class, the sink, to be in the org.apache.spark package 
tree. I could submit the package as a PR to the Spark codebase, and I'd be 
happy to do that but it could be a completely separate add-on.

There are similar issues with writing a 3rd party metrics source which may not 
be of interest to the community at large so would probably not warrant 
inclusion in the Spark codebase.

Any thoughts?

Reply via email to