Thanks TD.  I am going to try the python-scala hybrid approach by using
scala only for custom redis sink and python for the rest of the app .  I
understand it might not be as efficient as purely writing the app in scala
but unfortunately I am constrained on scala resources.  Have you come
across other use cases where people have resided to such python-scala
hybrid approach?

Regards,
Priyank



On Wed, Jul 26, 2017 at 1:46 AM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Hello Priyank
>
> Writing something purely in Scale/Java would be the most efficient. Even
> if we expose python APIs that allow writing custom sinks in pure Python, it
> wont be as efficient as Scala/Java foreach as the data would have to go
> through JVM / PVM boundary which has significant overheads. So Scala/Java
> foreach is always going to be the best option.
>
> TD
>
> On Tue, Jul 25, 2017 at 6:05 PM, Priyank Shrivastava <
> priy...@asperasoft.com> wrote:
>
>> I am trying to write key-values to redis using a DataStreamWriter object
>> using pyspark structured streaming APIs. I am using Spark 2.2
>>
>> Since the Foreach Sink is not supported for python; here
>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach>,
>> I am trying to find out some alternatives.
>>
>> One alternative is to write a separate Scala module only to push data
>> into redis using foreach; ForeachWriter
>> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter>
>>  is
>> supported in Scala. BUT this doesn't seem like an efficient approach and
>> adds deployment overhead because now I will have to support Scala in my app.
>>
>> Another approach is obviously to use Scala instead of python, which is
>> fine but I want to make sure that I absolutely cannot use python for this
>> problem before I take this path.
>>
>> Would appreciate some feedback and alternative design approaches for this
>> problem.
>>
>> Thanks.
>>
>>
>>
>>
>

Reply via email to