[
https://issues.apache.org/jira/browse/BEAM-6935?focusedWorklogId=222673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222673
]
ASF GitHub Bot logged work on BEAM-6935:
----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Apr/19 22:49
Start Date: 03/Apr/19 22:49
Worklog Time Spent: 10m
Work Description: ibzib commented on pull request #8220: [BEAM-6935]
Spark portable runner: implement side inputs
URL: https://github.com/apache/beam/pull/8220
1. During translation, convert each side input's corresponding RDD into
bytes, which bytes are then broadcast to the Spark context. (This is necessary
because in Spark we cannot broadcast an RDD directly like how we can broadcast
a Dataset in Flink.)
2. Pass reference to the resulting Spark broadcast variable along with a
decoder for those bytes to the executable stage function.
3. When the executable stage function needs to read the side inputs, it gets
the broadcasted bytes value, then uses the decoder to convert the bytes to the
side input value.
Notes:
- Converted `FlinkBatchSideInputHandlerFactory` to a pluggable
`BatchSideInputHandlerFactory` to connect Spark implementation with portability
framework.
- For now, testing is still just a log statement with the side input value.
More formal verification is still a few PRs away.
R: @angoenka @iemejia
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
<br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 222673)
Time Spent: 10m
Remaining Estimate: 0h
> Spark Portable Runner: Support side inputs
> ------------------------------------------
>
> Key: BEAM-6935
> URL: https://issues.apache.org/jira/browse/BEAM-6935
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
