[jira] [Commented] (BEAM-1146) Decrease spark runner startup overhead

Kenneth Knowles (JIRA) Sun, 18 Dec 2016 20:16:51 -0800

    [ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760088#comment-15760088
 ]


Kenneth Knowles commented on BEAM-1146:
---------------------------------------

The Apex runner uses {{@Bind}} annotations to instruct Kryo to use Java 
serialization on {{Source}} fields. This annotation is not functional in the 
version of Kryo linked to the spark runner. Is it feasible to upgrade and 
follow this method?

For Coders, the most robust way to use the functionality already present in 
Beam is to instruct Kryo to serialize them via the {{asCloudObject}} plus 
Jackson method and deserialize via Jackson.

Given the somewhat different approaches here, perhaps this ticket should be 
split? These really are totally different, even though they seem similar - a 
{{Source}} is a language-specific user-definable function, while a {{Coder}} is 
a proxy for a language-independent binary format. That's why coders have a 
defined language-independent serialization (even though that serialization is 
just left over from before Beam at the moment).

> Decrease spark runner startup overhead
> --------------------------------------
>
>                 Key: BEAM-1146
>                 URL: https://issues.apache.org/jira/browse/BEAM-1146
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Aviem Zur
>            Assignee: Amit Sela
>
> BEAM-921 introduced a lazy singleton instantiated once in each machine 
> (driver & executors) which utilizes reflection to find all subclasses of 
> Source and Coder
> While this is beneficial in it's own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is 
> because  there is no current way of knowing all the source and coder classes 
> at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-1146) Decrease spark runner startup overhead

Reply via email to