+1, my only additions would to expand this to make this work with spark sql and provide spark compute context (https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-compute-contexts) accessibility to the data from the sink, I'd love to take these other bits on if there's enough interest and committ the other bits back to the community.
Compute context options for R Server on HDInsight ...<https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-compute-contexts> docs.microsoft.com Microsoft R Server on Azure HDInsight provides the latest capabilities for R-based analytics. It uses data that's stored in HDFS in a container in your Azure Blob ... ________________________________ From: Tristan Stevens <tris...@cloudera.com> Sent: Friday, February 3, 2017 3:35 AM To: dev@flume.apache.org; Johny Rufus John Subject: Re: Flume+ML [Discussion] Johny, This is definitely the right way to do this. There's a Sink available already (from the docs that you provided) at https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala [https://avatars1.githubusercontent.com/u/47359?v=3&s=400]<https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala> spark/SparkSink.scala at master · apache/spark · GitHub<https://github.com/apache/spark/blob/master/external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkSink.scala> github.com spark - Mirror of Apache Spark ... * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. There's no reason that we couldn't distribute this with Flume, rather than shade our own copy. Thoughts anyone? Tristan On 3 February 2017 at 02:02:14, Johny Rufus John (johnyru...@gmail.com) wrote: Hi Saikat, Have you considered this approach, http://spark.apache.org/docs/latest/streaming-flume-integration.html Spark Streaming + Flume Integration Guide - Spark 2.1.0 ...<http://spark.apache.org/docs/latest/streaming-flume-integration.html> spark.apache.org Spark Streaming + Flume Integration Guide. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large ... < http://spark.apache.org/docs/latest/streaming-flume-integration.htmlhttp://stdatalabs.blogspot.in/2016/09/spark-streaming-part-2-real-time_10.html> Thanks, Johny On Thu, Feb 2, 2017 at 5:27 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: > Hi Flume community, > Would love to have inputs on this topic as this is a pertinent usecase > that I'm exploring at work. > Regards > > Sent from my iPhone > > On Feb 1, 2017, at 4:36 PM, Saikat Kanjilal <sxk1...@hotmail.com<mailto:sx > k1...@hotmail.com>> wrote: > > > Bump [??] > > > ________________________________ > From: Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>> > Sent: Wednesday, February 1, 2017 8:46 AM > To: dev@flume.apache.org<mailto:dev@flume.apache.org> > Subject: Flume+ML [Discussion] > > Hi Folks, > > I've been a bit delayed working on the graph sink for flume ( > https://github.com/skanjila/flume-ng-graphstore-sink) , in the meantime I > was wondering if there's been any thought or interest in connecting flume > to spark, I have a potential use case where we need to extract data out of > multiple data sources, do a set of transformations on this data and then > dump this data to a columnar store for downstream processing through a > Revoscale R cluster which uses spark underneath. I'd be interested in > leading this effort if there's enough interest in the community around use > cases for this. > [https://avatars0.githubusercontent.com/u/674374?v=3&s=400]<https:// > github.com/skanjila/flume-ng-graphstore-sink> > > skanjila/flume-ng-graphstore-sink<https://github.com/ > skanjila/flume-ng-graphstore-sink> > github.com<http://github.com> > flume-ng-graphstore-sink - A flume sink that writes to a set of graph > databases > > > > > [https://avatars0.githubusercontent.com/u/674374?v=3&s=400]<https:// > github.com/skanjila/flume-ng-graphstore-sink> > > skanjila/flume-ng-graphstore-sink<https://github.com/ > skanjila/flume-ng-graphstore-sink> > github.com<http://github.com> > flume-ng-graphstore-sink - A flume sink that writes to a set of graph > databases > > Look forward to hearing from folks. >