+1, my only additions would to expand this to make this work with spark sql and 
provide spark compute context 
(https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-compute-contexts)
 accessibility to the data from the sink, I'd love to take these other bits on 
if there's enough interest and committ the other bits back to the community.

Compute context options for R Server on HDInsight 
...<https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-compute-contexts>
docs.microsoft.com
Microsoft R Server on Azure HDInsight provides the latest capabilities for 
R-based analytics. It uses data that's stored in HDFS in a container in your 
Azure Blob ...




________________________________
From: Tristan Stevens <tris...@cloudera.com>
Sent: Friday, February 3, 2017 3:35 AM
To: dev@flume.apache.org; Johny Rufus John
Subject: Re: Flume+ML [Discussion]

Johny,
This is definitely the right way to do this.

There's a Sink available already (from the docs that you provided) at
https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala
[https://avatars1.githubusercontent.com/u/47359?v=3&s=400]<https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala>

spark/SparkSink.scala at master · apache/spark · 
GitHub<https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala>
github.com
spark - Mirror of Apache Spark ... * Licensed to the Apache Software Foundation 
(ASF) under one or more * contributor license agreements.




There's no reason that we couldn't distribute this with Flume, rather than
shade our own copy.

Thoughts anyone?

Tristan

On 3 February 2017 at 02:02:14, Johny Rufus John (johnyru...@gmail.com)
wrote:

Hi Saikat,

Have you considered this approach,

http://spark.apache.org/docs/latest/streaming-flume-integration.html
Spark Streaming + Flume Integration Guide - Spark 2.1.0 
...<http://spark.apache.org/docs/latest/streaming-flume-integration.html>
spark.apache.org
Spark Streaming + Flume Integration Guide. Apache Flume is a distributed, 
reliable, and available service for efficiently collecting, aggregating, and 
moving large ...



<
http://spark.apache.org/docs/latest/streaming-flume-integration.htmlhttp://stdatalabs.blogspot.in/2016/09/spark-streaming-part-2-real-time_10.html>



Thanks,
Johny

On Thu, Feb 2, 2017 at 5:27 PM, Saikat Kanjilal <sxk1...@hotmail.com>
wrote:

> Hi Flume community,
> Would love to have inputs on this topic as this is a pertinent usecase
> that I'm exploring at work.
> Regards
>
> Sent from my iPhone
>
> On Feb 1, 2017, at 4:36 PM, Saikat Kanjilal <sxk1...@hotmail.com<mailto:sx
> k1...@hotmail.com>> wrote:
>
>
> Bump [??]
>
>
> ________________________________
> From: Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>>
> Sent: Wednesday, February 1, 2017 8:46 AM
> To: dev@flume.apache.org<mailto:dev@flume.apache.org>
> Subject: Flume+ML [Discussion]
>
> Hi Folks,
>
> I've been a bit delayed working on the graph sink for flume (
> https://github.com/skanjila/flume-ng-graphstore-sink) , in the meantime I
> was wondering if there's been any thought or interest in connecting flume
> to spark, I have a potential use case where we need to extract data out
of
> multiple data sources, do a set of transformations on this data and then
> dump this data to a columnar store for downstream processing through a
> Revoscale R cluster which uses spark underneath. I'd be interested in
> leading this effort if there's enough interest in the community around
use
> cases for this.
> [https://avatars0.githubusercontent.com/u/674374?v=3&s=400]<https://
> github.com/skanjila/flume-ng-graphstore-sink>
>
> skanjila/flume-ng-graphstore-sink<https://github.com/
> skanjila/flume-ng-graphstore-sink>
> github.com<http://github.com>
> flume-ng-graphstore-sink - A flume sink that writes to a set of graph
> databases
>
>
>
>
> [https://avatars0.githubusercontent.com/u/674374?v=3&s=400]<https://
> github.com/skanjila/flume-ng-graphstore-sink>
>
> skanjila/flume-ng-graphstore-sink<https://github.com/
> skanjila/flume-ng-graphstore-sink>
> github.com<http://github.com>
> flume-ng-graphstore-sink - A flume sink that writes to a set of graph
> databases
>
> Look forward to hearing from folks.
>

Reply via email to