[ https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760088#comment-15760088 ]
Kenneth Knowles commented on BEAM-1146: --------------------------------------- The Apex runner uses {{@Bind}} annotations to instruct Kryo to use Java serialization on {{Source}} fields. This annotation is not functional in the version of Kryo linked to the spark runner. Is it feasible to upgrade and follow this method? For Coders, the most robust way to use the functionality already present in Beam is to instruct Kryo to serialize them via the {{asCloudObject}} plus Jackson method and deserialize via Jackson. Given the somewhat different approaches here, perhaps this ticket should be split? These really are totally different, even though they seem similar - a {{Source}} is a language-specific user-definable function, while a {{Coder}} is a proxy for a language-independent binary format. That's why coders have a defined language-independent serialization (even though that serialization is just left over from before Beam at the moment). > Decrease spark runner startup overhead > -------------------------------------- > > Key: BEAM-1146 > URL: https://issues.apache.org/jira/browse/BEAM-1146 > Project: Beam > Issue Type: Improvement > Components: runner-spark > Reporter: Aviem Zur > Assignee: Amit Sela > > BEAM-921 introduced a lazy singleton instantiated once in each machine > (driver & executors) which utilizes reflection to find all subclasses of > Source and Coder > While this is beneficial in it's own right, the change added about one minute > of overhead in spark runner startup time (which cause the first job/stage to > take up to a minute). > The change is in class {{BeamSparkRunnerRegistrator}} > The reason reflection (specifically reflections library) was used here is > because there is no current way of knowing all the source and coder classes > at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)