Hey Dylan,

Great!

Can you revert back to my initial and also the latest mail?

Thanks,
Aakash.

On 15-Mar-2018 12:27 AM, "Dylan Guedes" <djmggue...@gmail.com> wrote:

> Hi,
>
> I've been using the Kafka with pyspark since 2.1.
>
> On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu <aakash.spark....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm yet to.
>>
>> Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package
>> allows Python? I read somewhere, as of now Scala and Java are the languages
>> to be used.
>>
>> Please correct me if am wrong.
>>
>> Thanks,
>> Aakash.
>>
>> On 14-Mar-2018 8:24 PM, "Georg Heiler" <georg.kf.hei...@gmail.com> wrote:
>>
>>> Did you try spark 2.3 with structured streaming? There watermarking and
>>> plain sql might be really interesting for you.
>>> Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März 2018
>>> um 14:57:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> *Info (Using):Spark Streaming Kafka 0.8 package*
>>>>
>>>> *Spark 2.2.1*
>>>> *Kafka 1.0.1*
>>>>
>>>> As of now, I am feeding paragraphs in Kafka console producer and my
>>>> Spark, which is acting as a receiver is printing the flattened words, which
>>>> is a complete RDD operation.
>>>>
>>>> *My motive is to read two tables continuously (being updated) as two
>>>> distinct Kafka topics being read as two Spark Dataframes and join them
>>>> based on a key and produce the output. *(I am from Spark-SQL
>>>> background, pardon my Spark-SQL-ish writing)
>>>>
>>>> *It may happen, the first topic is receiving new data 15 mins prior to
>>>> the second topic, in that scenario, how to proceed? I should not lose any
>>>> data.*
>>>>
>>>> As of now, I want to simply pass paragraphs, read them as RDD, convert
>>>> to DF and then join to get the common keys as the output. (Just for R&D).
>>>>
>>>> Started using Spark Streaming and Kafka today itself.
>>>>
>>>> Please help!
>>>>
>>>> Thanks,
>>>> Aakash.
>>>>
>>>
>

Reply via email to