Re: [spark runner dataset POC] workCount works !

Ismaël Mejía Fri, 22 Mar 2019 07:59:28 -0700

At this moment Kyle's work on the portable spark runner is based in
the older APIs (RDD/DStream). But unifiying both definiely is a goal.
It is just a question of maturity of the machinery needed for the
Structured Streaming (Dataset) translation.


Btw, congrats Kyle also for starting this, awesome!

On Fri, Mar 22, 2019 at 3:24 PM Robert Bradshaw <rober...@google.com> wrote:
>
> Nice!
>
> Between this and the portability work
> (https://github.com/apache/beam/pull/8115), hopefully we'll have a
> modern Spark runner soon. Any idea on how hard (or easy?) it will be
> to merge those two?
>
>
> On Fri, Mar 22, 2019 at 9:29 AM Łukasz Gajowy <lgaj...@apache.org> wrote:
> >
> > Cool. :) Congrats and thank you for your work!
> >
> > Łukasz
> >
> > czw., 21 mar 2019 o 18:51 Kenneth Knowles <k...@apache.org> napisał(a):
> >>
> >> Nice milestone!
> >>
> >> On Thu, Mar 21, 2019 at 10:49 AM Pablo Estrada <pabl...@google.com> wrote:
> >>>
> >>> This is pretty cool. Thanks for working on this and for sharing:)
> >>> Best
> >>> -P.
> >>>
> >>> On Thu, Mar 21, 2019, 8:18 AM Alexey Romanenko <aromanenko....@gmail.com> 
> >>> wrote:
> >>>>
> >>>> Good job! =)
> >>>> Congrats to all who was involved to move this forward!
> >>>>
> >>>> Btw, for all who is interested in a progress of work on this runner, I 
> >>>> wanted to remind that we have #beam-spark channel on Slack where we 
> >>>> discuss all ongoing questions. Feel free to join!
> >>>>
> >>>> Alexey
> >>>>
> >>>> > On 21 Mar 2019, at 15:51, Jean-Baptiste Onofré <j...@nanthrax.net> 
> >>>> > wrote:
> >>>> >
> >>>> > Congrats and huge thanks !
> >>>> >
> >>>> > (I'm glad to be one of the little "launcher" to this effort ;) )
> >>>> >
> >>>> > Regards
> >>>> > JB
> >>>> >
> >>>> > On 21/03/2019 15:47, Ismaël Mejía wrote:
> >>>> >> This is excellent news. Congrats Etienne, Alexey and the others
> >>>> >> involved for the great work!
> >>>> >> On Thu, Mar 21, 2019 at 3:10 PM Etienne Chauchot 
> >>>> >> <echauc...@apache.org> wrote:
> >>>> >>>
> >>>> >>> Hi guys,
> >>>> >>>
> >>>> >>> We are glad to announce that the spark runner POC that was 
> >>>> >>> re-written from scratch using the structured-streaming framework and 
> >>>> >>> the dataset API can now run WordCount !
> >>>> >>>
> >>>> >>> It is still embryonic. For now it only runs in batch mode and there 
> >>>> >>> is no fancy stuff like state, timer, SDF, metrics, ... but it is 
> >>>> >>> still a major step forward !
> >>>> >>>
> >>>> >>> Streaming support work has just started.
> >>>> >>>
> >>>> >>> You can find the branch here: 
> >>>> >>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
> >>>> >>>
> >>>> >>> Enjoy,
> >>>> >>>
> >>>> >>> Etienne
> >>>> >>>
> >>>> >>>
> >>>>

Re: [spark runner dataset POC] workCount works !

Reply via email to