[jira] [Commented] (BEAM-9451) Optimize translation when Schema information is available in Spark Structured Streaming runner

Jira Thu, 05 Mar 2020 07:40:10 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052270#comment-17052270
 ]


Ismaël Mejía commented on BEAM-9451:
------------------------------------

Ongoing exploratory WIP for the interested 
https://github.com/iemejia/beam/tree/BEAM-9451-spark-structured-streaming-schema-translation

> Optimize translation when Schema information is available in Spark Structured 
> Streaming runner
> ----------------------------------------------------------------------------------------------
>
>                 Key: BEAM-9451
>                 URL: https://issues.apache.org/jira/browse/BEAM-9451
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Ismaël Mejía
>            Priority: Major
>              Labels: structured-streaming
>
> Spark Structured Streaming runner supports Datasets that already have Schema 
> information. This is used by Spark to optimize jobs (via Catalyst). This 
> issue is to implement optimized translations of the transforms for the runner 
> so we can benefit of the performance improvements internally done by Spark.
> Notice that we also may need to map Beam's core internal representations like 
> WindowedValue so we can have intermediary optimizations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9451) Optimize translation when Schema information is available in Spark Structured Streaming runner

Reply via email to