Relevant:
https://databricks.com/blog/2018/03/13/introducing-stream-stream-joins-in-apache-spark-2-3.html


This is true stream-stream join which will automatically buffer delayed
data and appropriately join stuff with SQL join semantics. Please check it
out :)

TD



On Wed, Mar 14, 2018 at 12:07 PM, Dylan Guedes <djmggue...@gmail.com> wrote:

> I misread it, and thought that you question was if pyspark supports kafka
> lol. Sorry!
>
> On Wed, Mar 14, 2018 at 3:58 PM, Aakash Basu <aakash.spark....@gmail.com>
> wrote:
>
>> Hey Dylan,
>>
>> Great!
>>
>> Can you revert back to my initial and also the latest mail?
>>
>> Thanks,
>> Aakash.
>>
>> On 15-Mar-2018 12:27 AM, "Dylan Guedes" <djmggue...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've been using the Kafka with pyspark since 2.1.
>>>
>>> On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu <aakash.spark....@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm yet to.
>>>>
>>>> Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package
>>>> allows Python? I read somewhere, as of now Scala and Java are the languages
>>>> to be used.
>>>>
>>>> Please correct me if am wrong.
>>>>
>>>> Thanks,
>>>> Aakash.
>>>>
>>>> On 14-Mar-2018 8:24 PM, "Georg Heiler" <georg.kf.hei...@gmail.com>
>>>> wrote:
>>>>
>>>>> Did you try spark 2.3 with structured streaming? There watermarking
>>>>> and plain sql might be really interesting for you.
>>>>> Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März 2018
>>>>> um 14:57:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Info (Using):Spark Streaming Kafka 0.8 package*
>>>>>>
>>>>>> *Spark 2.2.1*
>>>>>> *Kafka 1.0.1*
>>>>>>
>>>>>> As of now, I am feeding paragraphs in Kafka console producer and my
>>>>>> Spark, which is acting as a receiver is printing the flattened words, 
>>>>>> which
>>>>>> is a complete RDD operation.
>>>>>>
>>>>>> *My motive is to read two tables continuously (being updated) as two
>>>>>> distinct Kafka topics being read as two Spark Dataframes and join them
>>>>>> based on a key and produce the output. *(I am from Spark-SQL
>>>>>> background, pardon my Spark-SQL-ish writing)
>>>>>>
>>>>>> *It may happen, the first topic is receiving new data 15 mins prior
>>>>>> to the second topic, in that scenario, how to proceed? I should not lose
>>>>>> any data.*
>>>>>>
>>>>>> As of now, I want to simply pass paragraphs, read them as RDD,
>>>>>> convert to DF and then join to get the common keys as the output. (Just 
>>>>>> for
>>>>>> R&D).
>>>>>>
>>>>>> Started using Spark Streaming and Kafka today itself.
>>>>>>
>>>>>> Please help!
>>>>>>
>>>>>> Thanks,
>>>>>> Aakash.
>>>>>>
>>>>>
>>>
>

Reply via email to