Re: Structured Streaming with Kafka sources/sinks

Cody Koeninger Tue, 30 Aug 2016 09:13:01 -0700

In case it wasn't obvious from the ticket, I'm happy to work on this,
I just don't want to get in a situation where the work I do conflicts
with or duplicates work that's already being done.


On Tue, Aug 30, 2016 at 11:02 AM, Reynold Xin <[email protected]> wrote:
> In this case simply not much progress has been made, because people might be
> busy with other stuff.
>
> Ofir it looks like you have spent non-trivial amount of time thinking about
> this topic and have even designed something to work -- can you chime in on
> the JIRA ticket with your thoughts and your prototype? That would be
> tremendously useful to the project.
>
>
>
> On Tue, Aug 30, 2016 at 11:44 PM, Nicholas Chammas
> <[email protected]> wrote:
>>
>> > I personally find it disappointing that a big chuck of Spark's design
>> > and development is happening behind closed curtains.
>>
>> I'm not too familiar with Streaming, but I see design docs and proposals
>> for ML and SQL published here and on JIRA all the time, and they are
>> discussed extensively.
>>
>> For example, here are some ML JIRAs with extensive design discussions:
>> SPARK-6725, SPARK-13944, SPARK-16365
>>
>> Nick
>>
>> On Tue, Aug 30, 2016 at 11:10 AM Cody Koeninger <[email protected]>
>> wrote:
>>>
>>> Not that I wouldn't rather have more open communication around this
>>> issue...but what are people actually expecting to get out of
>>> structured streaming with regard to Kafka?
>>>
>>> There aren't any realistic pushdown-type optimizations available, and
>>> from what I could tell the last time I looked at structured streaming,
>>> resolving the event time vs processing time issue was still a ways
>>> off.
>>>
>>> On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor <[email protected]>
>>> wrote:
>>> > I personally find it disappointing that a big chuck of Spark's design
>>> > and
>>> > development is happening behind closed curtains. It makes it harder
>>> > than
>>> > necessary for me to work with Spark. We had to improvise in the recent
>>> > weeks
>>> > a temporary solution for reading from Kafka (from Structured Streaming)
>>> > to
>>> > unblock our development, and I feed that if the design and development
>>> > of
>>> > that feature was done in the open, it would have saved us a lot of
>>> > hassle
>>> > (and would reduce the refactoring of our code base).
>>> >
>>> > It hard not compare it to other Apache projects - for example, I
>>> > believe
>>> > most of the Apache Kafka full-time contributors work at a single
>>> > company,
>>> > but they manage as a community to have a very transparent design and
>>> > development process, which seems to work great.
>>> >
>>> > Ofir Manor
>>> >
>>> > Co-Founder & CTO | Equalum
>>> >
>>> > Mobile: +972-54-7801286 | Email: [email protected]
>>> >
>>> >
>>> > On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss <[email protected]>
>>> > wrote:
>>> >>
>>> >> I think that the community really needs some feedback on the progress
>>> >> of
>>> >> this very important task. Many existing Spark Streaming applications
>>> >> can't
>>> >> be ported to Structured Streaming without Kafka support.
>>> >>
>>> >> Is there a design document somewhere?  Or can someone from the
>>> >> DataBricks
>>> >> team break down the existing monolithic JIRA issue into smaller steps
>>> >> that
>>> >> reflect the current development plan?
>>> >>
>>> >> Fred
>>> >>
>>> >>
>>> >> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> thats great
>>> >>>
>>> >>> is this effort happening anywhere that is publicly visible? github?
>>> >>>
>>> >>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin <[email protected]>
>>> >>> wrote:
>>> >>>>
>>> >>>> We (the team at Databricks) are working on one currently.
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger <[email protected]>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> https://issues.apache.org/jira/browse/SPARK-15406
>>> >>>>>
>>> >>>>> I'm not working on it (yet?), never got an answer to the question
>>> >>>>> of
>>> >>>>> who was planning to work on it.
>>> >>>>>
>>> >>>>> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao
>>> >>>>> <[email protected]>
>>> >>>>> wrote:
>>> >>>>> > Hi all,
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > I’m trying to write Structured Streaming test code and will deal
>>> >>>>> > with
>>> >>>>> > Kafka
>>> >>>>> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > I found some Databricks slides saying that Kafka sources/sinks
>>> >>>>> > will
>>> >>>>> > be
>>> >>>>> > implemented in Spark 2.0, so is there anybody working on this?
>>> >>>>> > And
>>> >>>>> > when will
>>> >>>>> > it be released?
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > Thanks,
>>> >>>>> >
>>> >>>>> > Chenzhao Guo
>>> >>>>>
>>> >>>>>
>>> >>>>> ---------------------------------------------------------------------
>>> >>>>> To unsubscribe e-mail: [email protected]
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Structured Streaming with Kafka sources/sinks

Reply via email to